Molecular computing elements: gates and flip-flops

ABSTRACT

This invention relates to novel molecular constructs that act as various logic elements, i.e. gates and flip-flops. The constructs are useful in a wide variety of contexts including, but not limited to, computation and control systems. The basic functional unit of the construct comprises a nucleic acid having at least two protein binding sites that cannot be simultaneously occupied by their cognate binding protein. This basic unit can be assembled in number of formats providing molecular constructs that act like traditional digital logic elements (flip-flops, gates, inverters, etc.).

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] [Not Applicable]

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY SPONSOREDRESEARCH AND DEVELOPMENT

[0002] [Not Applicable]

FIELD OF THE INVENTION

[0003] This invention relates to novel molecular constructs that act asvarious logic elements, i.e., gates and flip-flops. The constructs areuseful in a wide variety of contexts including, but not limited to,computation and control systems.

BACKGROUND OF THE INVENTION

[0004] The history of computational devices reveals a progression fromlarger and slower to smaller and faster devices. Huge stepwise advancesin this progression have accompanied significant changes in theunderlying technology. Thus, for example, vast increases incomputational speed accompanied the transition from mechanical,hand-operated devices such as the abacus and hand operated cash-registeror calculator to electrically driven mechanical computers (e.g., theelectric cash register/calculator). Similarly significant increases inspeed and decreases in size accompanied the shift from mechanical baseddevices to tube-based electronic computers, again with the shift fromtube-based electronic computers to transistor-based electroniccomputers, and yet again with the shift from discrete transistorcircuits to integrated circuits to large scale integrated (LSI)circuits.

[0005] The continually decreasing size and increasing speed of largescale integrated electronic devices has recently provoked increasedinterest and concern regarding the theoretical and practical limits ofthis progression. Such theoretical limits are affected by the inherentnoise in electronic systems, the need to dissipate heat across everdecreasing surface areas as the feature size of various elementsdecreases, and the “anomalous” behavior of devices as their physicalsize decreases to a point at which quantum mechanical rather thanmacroscopic properties predominate. (It will be noted however, that theemergence of quantum mechanical properties at small feature size mayprovide the basis for quantum computing devices and this field isreceiving considerable interest). Practical limits are imposed by costsand difficulties in predictable and reliable microfabrication.

[0006] Another approach to the improvement in computational power and/orefficiency has involved the substitution of “linear” computing systemsin which a single processor sequentially performs the necessaryoperations in a calculation with “parallel” computing system in whichcomponents of each calculation are distributed across two or moreprocessing elements. Parallel computing systems can achieve vast savingsin computational time. For example, an algorithm running on 100computing elements in parallel in principle can run about 100 timesfaster than the same algorithm on a single element that must processeach operation sequentially. Of course the actual gain in efficiency isless than 100 because some time is lost in parsing the algorithm betweenthe various computing elements, in integrating the elements, and becausesome elements may have to wait for other elements to complete theircalculation before the next operation can proceed. Nevertheless,massively parallel systems have been able to solve problems (e.g.,identify large prime numbers) that could not be practically determinedon linear computer systems.

[0007] A combination of the two approaches, massive parallelism combinedwith small computational element size has birthed the field of molecularcomputing. This is illustrated in the seminal paper by Adleman (1994)Science 266: 1021-1024, in which molecular biological tools were used tosolve an instance of the directed Hamiltonian path problem. Inparticular, Adleman encoded the problem (a directed Graph) into nucleicacid sequences and then performed a series of ligations that ultimatelyproduced an encoded solution which could then be decoded. FollowingSchneider (1991) J. Theoret. Biol., 148: 125, Adleman suggested thatsuch molecular systems could demonstrate remarkable energy efficiencywith a theoretical maximum of 34×10¹⁹ operations per Joule whileconventional supercomputers execute at most 10⁹ operations per joule.

[0008] Adleman recognized that DNA molecular computing imposed certaindifficulty and limitations, particularly on the encoding of variousproblems and recognized that conventional electronic computers have anadvantage in the variety of operations they provide and the flexibilitywith which these operations can be applied. He did, however, note thatfor certain intrinsically complex problems, such as the directedHamiltonian path problem where existing electronic computers are veryinefficient and where massively parallel searches can be organized totake advantages of the operations provided by molecular biology, suchmolecular computations may be advantageous.

[0009] As indicated by Adleman, one limitation of prior molecularcomputation systems has been the lack of a variety of operations and theflexibility with which they may be applied.

SUMMARY OF THE INVENTION

[0010] This invention overcomes a number of these limitations byproviding molecular logic devices that operate in a manner analogous totheir electronic counterparts and thus provide a wide variety ofoperations. Thus, in one embodiment, this invention provides molecularbistable elements (flip-flops) and a wide variety of logic elements(gates) such as the AND, OR, NAND, NOR, NOT gates and others.

[0011] The central operational element of these devices is a nucleicacid having two or more protein binding sites (e.g., a first proteinbinding site and a second protein binding site). The sites are arrangedsuch that when the first protein binding site is specifically bound by aprotein, the second binding site cannot be bound by a protein thatotherwise specifically recognizes and binds the second binding site; andwhen the second binding site is specifically bound by a protein, thefirst binding site cannot be bound by a protein that otherwisespecifically recognizes and binds the first binding site. The bindingsites are thus mutually exclusive. The nucleic acid can be a single ordouble stranded nucleic acid, however double stranded nucleic acids(e.g., DNA) are preferred. The first and the second binding sites canhave the same or different nucleotide sequences. In one preferredembodiment the first and second binding sites are the same and have thenucleotide sequence of SEQ ID NO: 1 described herein.

[0012] The binding sites can be chosen so that they are specificallyrecognized (bound) by any of the nucleic acid binding proteins describedherein (e.g., Fis, modified EF-tu, Tus, and LexA).

[0013] As indicated above, the binding sites are spaced so that they aremutually exclusive (only one can be bound at a time). The first bindingsite is preferably within 20 nucleotides (base pairs) of the secondsite, more preferably within 15 base pairs, and most preferably within11 or fewer base pairs of the second site. Preferred binding sites havea strength of at least 2.4 bits as determined by individual informationtheory. The difference in strength between the two sites is at least 0bits as determined by individual information theory.

[0014] The “flip-flop” may additionally include one or more selectorbinding sites (e.g. a third protein binding site) where the selectorbinding site is in proximity to the first protein binding site or to thesecond protein binding site such that specific binding of the thirdbinding site (e.g., with a protein) precludes specific protein bindingof the first or second protein binding sites.

[0015] In one preferred embodiment the flip-flop comprises theabove-described nucleic acid in which the first protein binding site isa Fis binding site; the second protein binding site is a Fis bindingsite; and the binding sites are separated from each other by less than12 nucleotide base pairs. In a particularly preferred flip-flop thenucleic acid is a deoxyribonucleic acid comprising the sequence of SEQID NO: 2 or SEQ ID NO: 3 described herein.

[0016] In another embodiment this invention provides the various logicgates (NOR, OR, NOT, AND, NAND) described herein. The fundamental unitof these gates is the NOR gate. In one embodiment, the NOR gate is acomposition comprising an isolated nucleic acid having a length of atleast 5 base pairs and having a nucleotide sequence that encodes a firstprotein binding site, a second protein binding site, and a third proteinbinding site where the protein binding sites are spaced in proximity toeach other such that when either the first protein binding site or thethird protein binding is specifically bound by a nucleic acid bindingprotein, the second binding site cannot be bound by a nucleic acidbinding protein that otherwise specifically recognizes and binds thesecond binding site; and where the first protein binding site and thethird protein binding site can simultaneously be specifically bound by anucleic acid binding protein. The NOR gate can be in a state in whichthe first or third binding site is bound by a nucleic acid bindingprotein (e.g. Fis or any of the binding proteins described herein) andthus set in a HIGH state. Similarly, the second binding site can bebound by a nucleic acid binding protein, but not when either the firstor the second site is bound.

[0017] The binding protein bound to the second binding site can beattached to an activator (e.g. a gene transactivator such as Gal4). Inaddition, the NOR gate can further comprise a gene or cDNA under thecontrol of the activator. The gene or cDNA can encode virtually anystructural protein. Thus, in one embodiment, the gene may be a reportergene (e.g., FFlux, GFP, etc.) or in another embodiment the gene mayencode a nucleic acid binding protein. This provides a method ofcoupling the output of one gate or flip-flop to the input of the samegate or flip-flop or to the input of another gate or flip-flop.

[0018] As with the flip-flop described above, in one embodiment, theunderlying nucleic acid can be double stranded (e.g., a DNA). The threebinding sites comprising the NOR gate can all be different (in whichcase, no selectors are necessary although they optionally can bepresent). Alternatively, the first and third binding sites can have thesame nucleotide sequence (i.e., bind the same protein with the samestrength) in which case, the NOR gate acts like a NOT gate (when I₁=I₂,NOR(I₁,I₂)=NOT(I₁)). In another embodiment, the first or third bindingsites and the second binding site can have the same nucleotide sequence.The binding sites can be chosen so that they are cognate binding sitesfor any particular binding protein. Preferred spacings between the firstand second site and between the second and third site are as describedabove.

[0019] Preferred binding sites have a binding strength of at least 2.4bits as determined by individual information theory. In one embodiment,the difference in strength between the first, and third site is at least0 bits as determined by individual information theory. In a particularlypreferred embodiment, the first protein binding site is a Fis bindingsite; and the third protein binding site is a Fis binding site (e.g.,the binding site of SEQ ID NO: 1).

[0020] In another embodiment this invention provides a composition forthe storage of binary information. The preferred storage compositioncomprises any of the flip-flops described above having a nucleic acidbinding protein bound to the first protein binding site or to the secondprotein binding site. The underlying nucleic acid can have restrictionsites at one or both ends and preferably different restriction sites ateach end. The restriction sites are preferably located so that when thebinding site adjacent to a restriction site is occupied with a bindingprotein, a ligase is incapable of ligating the mating strand to thatrestriction site.

[0021] The storage composition can be free in solution or it can beattached to a solid support. The binding protein may be covalentlylinked to the underlying nucleic acid. The binding protein can beattached to a gene transactivator as described above. In addition, thestorage composition can include one or more genes or cDNAs as describedabove that are preferably under control of the activator.

[0022] In still another embodiment, this invention provides a method ofstoring information. The method involves binding a nucleic acid bindingprotein to a first protein binding site on a nucleic acid comprising anyof the above-described flip-flops or on a nucleic acid comprising any ofthe gates described herein. The method may further involve the step ofdetermining which binding site on the nucleic acid is bound by saidbinding protein.

[0023] This invention also provides a method of transforming binaryinformation. This method involves binding a nucleic acid binding proteinto an input protein binding site on any one or more of the gatesdescribed herein and determining whether or not a nucleic acid bindingprotein can bind to an output protein binding site. The output bindingsite can be on the same or a different gate and it can be on the same ora different nucleic acid. In a preferred embodiment, a nucleic acidcomprising a gate used for this purpose has a length of at least 3,preferably a length of at least 5, more preferably a length of at least7 and most preferably a length of at least 22 base pairs.

[0024] Definitions

[0025] The terms “polypeptide”, “peptide” and “protein” are usedinterchangeably herein to refer to a polymer of amino acid residues. Theterms apply to amino acid polymers in which one or more amino acidresidue is an artificial chemical analogue of a corresponding naturallyoccurring amino acid, as well as to naturally occurring amino acidpolymers.

[0026] The term “nucleic acid” refers to a deoxyribonucleotide orribonucleotide polymer in either single- or double-stranded form, andunless otherwise limited, encompasses known analogs of naturalnucleotides that can function in a similar manner as naturally occurringnucleotides. The term also includes nucleotides linked by peptidelinkages as in “peptide” nucleic acids.

[0027] The term “specifically binds”, as used herein, when referring tothe binding of a protein or polypeptide to a nucleic acid refers to aprotein nucleic acid interaction in which the protein binds strongly toa specific nucleic acid sequence pattern (nucleotide sequence) and lessstrongly to other different nucleic acid patterns (e.g., in a gel shiftassay, specific binding will show an significant gel shift as comparedto the gel shift shown by the same protein to other different nucleicacid sequences of the same length).

[0028] The term “nucleic acid binding protein” is used herein to referto a protein that specifically binds to a nucleic acid at a particularnucleotide sequence. Nucleic acid binding proteins include DNA bindingproteins, mRNA binding proteins, tRNA binding proteins, proteins thatspecifically bind modified or otherwise non-standard nucleic acids asdescribed above. Nucleic acid binding proteins include, but are notlimited to DNA binding proteins such as Fis, LacI, lambda cI, lambdacro, LexA, TrpR, ArgR, AraC, CRP, FNR, OxyR, IF, GalR, MalT, LRP, SoxR,SoxS, sigma factors, chi, T4 MotA, P1 RepA, p53, NF-kappa-B, and RNAbinding proteins or protein/RNA complexes such as ribosomes, T4 regA,spliceosomes (donor and acceptor), polyA binding factor, and the like. Alarge number of nucleic acid binding proteins are described in theTransFac database(ftp://transfac.gbf-braunschweig.de/pub/transfac/ascii/, see alsoNucleic Acids Res. (25(1) 265-268 (1997)).

[0029] A “protein binding site” refers to a nucleotide sequence in anucleic acid to which a particular nucleic acid binding proteinspecifically binds.

[0030] The terms “cognate protein” or “cognate binding site” refer tothe protein that specifically binds to the binding site or to thebinding site that is specifically bound by a particular binding protein,respectively.

[0031] A binding site “blocker”, “selector”, or “modulator” refers to amoiety that when bound adjacent to, in proximity to or on a bindingsite, partially or completely blocks binding of that site by its cognatenucleic acid binding protein.

[0032] The term “flip-flop” refers to a bistable device that exists inone or the other of two mutually exclusive states. Thus the molecularflip-flops of this invention have two binding sites, only one of whichcan be bound at a time.

[0033] The term “gate” is used to refer to a device that produces aparticular (predetermined) output in response to one or more inputs.Thus, for example, an AND gate produces a HIGH output only when allinputs are HIGH. An OR gate produces a HIGH output when any input isHIGH and a LOW output only when all inputs are LOW. A NOT functionreturns a HIGH when input is LOW and a LOW when input is HIGH. Gates andtheir uses are well known to those of skill in the art (see, e.g.Horowitz and Hill (1990) The Art of Electronics, Cambridge UniversityPress, Cambridge).

[0034] The term “state” is used to refer to the signal state of aparticular binding site of a flip-flop or of a logic gate of thisinvention. A protein binding site that is protein bound or capable ofbeing protein bound is said to be HIGH, while a binding site that isunbound and cannot be bound by a binding protein is said to be LOW.

[0035] The term “input” is used herein to refer to a binding site towhich a signal may be applied in order to elicit an output. The signalitself (e.g., a signal polypeptide) may also be referred to as an input.The difference will be determined from the context of usage. The term“input binding site” refers to a protein binding site that is used as aninput.

[0036] The term “output” is used herein to refer to a binding site thatis rendered capable or incapable of binding its cognate protein as aconsequence of an input binding event or events. The term output canalso refer to the state of the output binding site. The output canprovide an input for another gate or flip-flop of this invention.

[0037] A “signal protein” is a nucleic acid binding protein that setsthe (logical) state of a molecular flip-flop or of a molecular gate ofthis invention. As described herein, binding of a signal protein to aprotein binding site on a nucleic acid sets the state of that bindingsite high. A signal protein can also be used to read the state of theflip-flop or gate. In this latter context, where the protein is capableof binding an output binding site (i.e., the binding site is unblocked),the state of the output is said to be HIGH. Conversely, where the outputbinding site is blocked, the state is said to be LOW.

[0038] The term “setting the state” when referring to a binding siterefers to selectively binding or unbinding a signal protein from aparticular binding site. Where the signal protein is bound to thebinding site, the state of that binding site is set high. Conversely,where the signal protein is removed from the site, the state is set LOW.With respect to a flip-flop, setting the state refers to setting theflip-flop into one of its mutually exclusive stable states. Thus stateone can be set by binding a signal protein to the first binding site,while state two can be set binding a signal protein to the secondbinding sites. Since the two states are mutually exclusive setting thestate at one site implicitly involves setting (or switching) the stateat the other site.

[0039] The term “resetting the state” the state refers to changing thestate (e.g., from HIGH to LOW or from LOW to HIGH) of an input bindingsite, an output binding site, or a flip-flop.

[0040] The term “GTPase-like” protein refers to a binding protein thatcan release a bound nucleic acid with the dissipation of energy (e.g.hydrolysis of an energy source such as GTP, ATP, etc., or input oflight). Such release may optionally be accomplished with the additionaluse of a co-factor. A GTPase-like protein includes naturally occurringGTPase-like proteins (including GTPases) as well as modified andnon-natural GTPase-like proteins.

[0041] A “recombinant expression cassette” or simply an “expressioncassette” is a nucleic acid construct, generated recombinantly orsynthetically, with nucleic acid elements that are capable of affectingexpression of a structural gene or genes in hosts compatible with suchsequences. Expression cassettes include at least promoters andoptionally, transcription termination signals. Typically, therecombinant expression cassette includes a nucleic acid to betranscribed (e.g., a nucleic acid encoding a desired polypeptide), and apromoter. Additional factors necessary or helpful in effectingexpression may also be used as described herein. For example, anexpression cassette can also include nucleotide sequences that encode asignal sequence that directs secretion of an expressed protein from thehost cell.

[0042] A “logic cassette” refers to an expression cassette in which theexpression of one or more genes is under the control of one or moremolecular gates or flip-flops of this invention.

[0043] The phrase “expression is under the control of” when referring toa logic element (e.g., gate or flip-flop) indicates that changes in thestate(s) (input and/or output) of the gate or flip-flop alters theexpression level of the gene or gene under said control.

[0044] Similarly, a gene “operably linked” or “under the control” of anactivator refers to a gene whose expression is altered by the presenceor absence of a particular activator.

[0045] A “tethered activator” refers to a gene activator (e.g. Gal4)bound directly or through a linker to a nucleic acid binding protein(e.g. LexA). The attachment can be chemical conjugation or byrecombinant expression of a fusion protein. In some instances arepressor can be used in place of the activator and the term tetheredactivator is intended to encompass this possibility.

[0046] The term “binding strength” as used herein refers to bindingstrength as calculated using individual information theory (e.g., asdescribed in Schneider (1997) J. Theoret. Biol., 189(4): 427-441) or asmeasured by binding energy (−ΔG).

[0047] The terms “isolated” “purified” or “biologically pure” refer tomaterial which is substantially or essentially free from componentswhich normally accompany it as found in its native state. In the case ofa nucleic acid, an isolated nucleic acid is typically free of thenucleic acid sequences by which it is flanked in nature. An isolatednucleic acid can be reintroduced into a cell and such “heterologous”nucleic acids are regarded herein as isolated. In addition, nucleicacids synthesized de novo or produced by cloning (e.g. recombinant DNAtechnology) are also regarded as “isolated”.

[0048] The term “sequence logo” refers to a graphical method fordisplaying the patterns in a set of aligned sequences. The charactersrepresenting the sequence are stacked on top of each other for eachposition in the aligned sequences. The height of each letter isproportional to its frequency and the letters are sorted so the mostcommon is on top. The height of the entire stack is then adjusted tosignify the information content of the sequences at that position. Fromthese “sequence logos” one can determine not only the consensussequence, but also the relative frequency of bases and informationcontent (measured in bits) at every position in a site or sequence. Thelogo displays both significant residues and subtle sequence patterns.Sequence logos are described in detail in Schneider & Stephens (1990)Nucl. Acids Res., 18: 6097-6100 and Schneider (1996) Meth. Enzym., 274:445-455.

[0049] The term “sequence walker” refers to a graphical method fordisplaying how binding proteins and other macromolecules interact withindividual bases of nucleotide sequences. Characters representing thesequence are either oriented normally and placed above a line indicatingfavorable contact, or upside-down and placed below the line indicatingunfavorable contact. The positive or negative height of each lettershows the contribution of that base to the average sequence conservationof the binding site, as represented by a sequence logo. These sequence“walkers” can be stepped along raw sequence data to visually search forbinding sites. Many walkers, for the same or different proteins, can besimultaneously placed next to a sequence to create a quantitative map ofa complex genetic region. One can alter the sequence to quantitativelyengineer binding sites. Database anomalies can be visualized by placinga walker at the recorded positions of a binding molecule and bycomparing this to locations found by scanning the nearby sequences. Thesequence can also be altered to predict whether a change is apolymorphism or a mutation for the recognizer being modeled. Thecalculation and use of “sequence walkers” are described in Schneider(1997) Nucl. Acids Res., 25: 4408-4415, and in copending applicationU.S. Ser. No. 08/494,115, filed on Jun. 23, 1995. The mathematics forwalkers is given in: Schneider (1997) J. Theor. Biol. 189(4): 427-441

BRIEF DESCRIPTION OF THE DRAWINGS

[0050]FIGS. 1a and 1 b illustrate the two states of a basic molecularflip-flop. The horizontal line represents a nucleic acid while the boxeslabeled BS1 and BS2 represent protein binding sites. The circles,labeled BP1 and BP2 represent binding proteins that bind to BS1 and BS2respectively. The binding sites are situated so that when BP1 is boundto BS1, BS2 cannot be occupied. Conversely when BP2 binds to BS2, BS1cannot be occupied. BS1 and BS2 can be the same type of protein bindingsite or different kinds of protein binding sites (i.e., bind differentproteins). Similarly, BP1 and BP2 can be the same or different kinds ofbinding proteins. The circles S1 and S2 represent optional “selectors”,binding sites that when bound block BS1 and BS2 and thereby allowselective setting of the state of the flip-flop.

[0051]FIG. 2 illustrates readout of a flip-flop using a nucleic acidreadout molecule and a ligation reaction. The flip-flop is incubatedwith two readout molecules one of which has an end complementary to oneend of the flip-flop nucleic acid and the other of which has an endcomplementary to the other end of the flip-flop nucleic acid. Thereadout molecules, under hybridizing conditions, bind to the respectiveends of the flip-flop nucleic acid. A ligase reaction is then run. Thebound nucleic acid binding protein (Fis in FIG. 2) blocks access of theligase to the site adjacent to the bound binding site (BS1 in FIG. 2)thereby preventing ligation of that readout molecule. The other readoutmolecule is successfully ligated to the nucleic acid and provides asignal indicating which binding site was not blocked.

[0052]FIG. 3 illustrates a molecular NOR gate, a NOT gate, geneactivation under the control of a gate, signal coupling between twogates and a molecular OR gate. Gate A acts as a simple NOR gate, outputO₁ providing a NOR response to inputs I₁ and I₂, as described herein.Gate B, in which I₃ and I₄ are identical provides a molecular NOT gatewith O₂ providing a NOT response to inputs at I₃ or I₄ as describedherein. Gate A also illustrates regulation of gene expression. Whenoutput O₁ is HIGH, protein BP1 can bind to that site. This anchors thetethered activator (A) which then activates expression of the gene.

[0053] In FIG. 3, the gene expresses a binding protein that canspecifically bind to I₃ or I₄ of gate B thereby illustrating thecoupling of the output of Gate A to the input of Gate B. The output ofgate B (O₂) produces an OR in response to inputs (I₁ or I₂) at gate A.Any of the illustrated binding sites may also optionally occur with aselector molecule.

[0054]FIG. 4 illustrates an AND and a NAND gate. Two NOR gates, gates Aand B, provide inputs into a third NOR gate (gate C). This produces anoutput at O₃ that is an AND function of the inputs to gates A and B (I₁and I₂). Inversion of the AND signal by NOT gate D results in a NAND atoutput O₄ in response to inputs at I₁ and I₂. The input sites of gatesA, B, and D are identical so these NOR gates act as simple inverters.

[0055]FIG. 5a illustrates a simplified AND gate in which the samebinding site acts as both an input and an output for gates A and B.

[0056]FIG. 5b illustrates a second simplified AND gate utilizing fiveprotein binding sites. The AND gate is illustrated with a tetheredbinding protein activating a gene.

[0057]FIG. 6 shows a resertable flip-flop utilizing GTPase-likeproteins. The flip-flop consists of a nucleic acid having 4 bindingsites, designated c, a₁, b, and a₂. The site can be bound by σ_(x) whichis a nucleic acid binding protein. The σ proteins are GTPase-likeproteins that, when triggered, release the nuclei acid. The proteinsalso contain a GAP finger that can trigger the release of the adjacent σprotein, but that cannot trigger its own release. The σ₁ protein thenbinds site c and because of its GAP finger, causes the removal of a caat site a₁. Thus, when the flip-flop is contacted with σ_(a) the σ_(a)only remains bound at site a₂. The flip-slop is thus stable at site a₂.To switch its state, the flip-flop is contacted with σ_(b) that binds atsite b causing the removal of σ_(a) and leaving the flip-flop stablybound at site b, the second state of the flip-flop. The flip-flop can bereset to state a by contacting it with σ_(a) which binds to sites a₁ anda₂. Site a₁ is bound long enough to cause the release of σ_(b) beforethe σ_(a) molecule is removed by σ_(c). This leaves the flip-flop resetto state a, with site a₂ bound by σ_(a). The cycle can be repeatedindefinitely.

[0058]FIG. 7 illustrates the self-similarity of Fis binding sites. Thesequence logo for Fis (Schneider & Stephens (1990) Nucl. Acids Res., 18:6097-6100; Hengen et al. (1997) Nucl. Acids Res., 25(24): 4994-5002) isshown three times. The upper and lower logos are shifted +11 and +7bases to the right (respectively) relative to the middle logo. Thecosine wave, with a wavelength of 10.6 bases, shows that the +11relatively shifted Fis sites would be on the same face of the DNA, whilethe +7 relatively shifted Fis sites would be on opposite faces. Arrowsare at positions where the logo is self-similar after a shift. Downarrows mean that the contacts by Fis to the bases would interferebecause they would be on the same face of the DNA. Up arrows mean thatthe contacts could be simultaneous because they are on opposite faces.

[0059]FIGS. 8a, 8 b, and 8 c illustrate the oligonucleotide design ofoverlapping and separated Fis binding sites. The predicted Fis sites areshown by walkers floating below each DNA sequence (Schneider (1997)Nucl. Acids Res., 25: 4408-4415; Hengen et al. 1997) supra.). In awalker, the vertical box marks the zero base of the binding site. Thebox also shows the vertical scale, with the upper edge being at +2 bitsand the lower edge being at −3 bits. The height of each letter isdetermined from the bit value in the Riw(b,l) matrix (Schneider (1997)J. Theoret. Biol., 189(4): 427-441; Schneider (1997) Nucl. Acids Res.,25: 4408-4415; Hengen et al. (1997) supra.). Negative weights arerepresented by drawing the letter upside-down and placing it below thezero bit level. Three DNAs were designed, each having two Fis sitesspaced 11, 7 and 23 bases apart. Design details are given in Example 1,Materials and Methods. The total strength of a site is the sum of theinformation weights for each base. The 18.1 bit Fis sites are 3.4standard deviations higher than the average Fis site in naturalsequences (Hengen et al. (1997) supra.; Schneider (1997) J. Theoret.Biol., 189(4):427-441). The 12.7 and 15.0 bit sites are 1.6 and 2.4standard deviations above average (respectively).

[0060]FIG. 9 illustrates the mobility shift experiments for 11 and 7base pair overlapping and 2) base pair separated Fis sites. Each lanecontains increasing concentrations of Fis protein, beginning with noFis, Fis diluted 1 to 64, etc. The 1:1 dilution is at 2200 nM Fis. Thisconcentration was chosen intentionally so that with 1 nM DNA perreaction, the protein/DNA ratio is 2-fold higher than that needed tostrongly shift DNA containing the 8.9 bit wild-type hin distal Fis site(Bruist et al. (1987) Genes Dev. 1: 762-772). The sequences are given inFIG. 8. Marker lanes (M) contain 10 ng of biotinylated φX174 hinfIdigested DNA standards (Life Technologies, Inc.). Sizes are indicated inbp. The lowest band in most lanes of the figure is single-strandedoligonucleotide DNA. In the “Separated 23” experiment, at highconcentrations, Fis proteins are apparently able to capture thesingle-stranded DNA when it has folded into a hairpin. This produces afaint band near the 100 bp marker.

[0061]FIG. 10 shows the positions of Fis and DnaA sites at theEscherichia coli origin of replication (oriC). Sequence data are fromGenBank accession K01789. The horizontal dashes below the sequencerepresent regions protected by Hs. Locations of DnaA sites are fromMesser et al. (1991) Res. Microbiol, 142: 119-125). The asymmetric DnaAindividual information matrix was created from 27 experimentallydemonstrated DnaA binding sites (data not shown). DNA synthesis startsites are indicated by the, arrows at the bottom (Seufert & Messer(1987) EMBO J. 6: 2469-2472). The boxes mark two Fis sites separated by11 bases. Fis sites with positive individual information are marked from−7 to +7 but evaluated from −10 to +10 according to the matrix. DnaAsite directionality is indicated by letters turned sideways in thedirection that DnaA binds (Schneider (1997) Nucl. Acids Res., 25:4408-4415).

[0062]FIG. 11 shows the design of the oligonucleotide of Example 2.

DETAILED DESCRIPTION

[0063] This invention provides a novel nucleic acid/protein constructthat characteristically can exist in either of two mutually exclusivestates. In general the construct, generally referred to herein as a“flip-flop” comprises a nucleic acid having at least two protein bindingsites. The binding sites are situated close to each other so that when afirst site is bound by its cognate nucleic acid binding protein thesecond site cannot be bound (e.g., due to steric hindrances).Conversely, when the second site is bound by its cognate nucleic acidbinding protein, the first site cannot be bound. The flip-flop can thusexist in two possible mutually exclusive states; either the first sitebound or the second binding site bound.

[0064] It will be appreciated that two state elements such as theflip-flop described herein form the heart of a wide variety of digitalinformation processing and control systems. In particular, it isexplained herein that the flip-flops can act as static or dynamic datastorage elements (i.e., each flip-flop acting as a bit, e.g., in a readonly memory). In addition, the flip-flops can be assembled into “logic”gates (e.g., AND, OR, NAND, NOR, NOT) that act as computational elementsor that can be assembled to control cellular machinery (e.g. theexpression of one or more genes). This invention thus provides novelmethods for regulating gene expression in cells. In addition, theflip-flops of this invention can be used in sequential logic systems(i.e., as true resettable flip-flops) to provide a true molecular binarycomputational or control system.

[0065] I. Flip-Flops, Gates, and their Uses

[0066] A) Simple Data Storage: Read Only Memory (ROM).

[0067] The molecular flip-flops of this invention can be used for simpledata storage. In effect the molecular flip-flops consisting of a nucleicacid having two mutually exclusive protein binding sites have threediscrete states; completely unbound, the first site bound, or the secondsite bound (see, e.g., FIGS. 1a and 1 b). The “flip-flops” could be usedto store information encoded in a trinary system.

[0068] However, given the general emphasis on binary storage, typicallyonly two states will be used. These will preferably include either boundversus unbound, or site one bound versus site two bound (BS1 vs BS2 inFIGS. 1a and 1 b). In the first instance, the unbound state could bedesignated zero and the bound state one, while in the second case, thefirst site bound could be designated zero and the second site designatedone.

[0069] The state of the single nucleic acid molecule (unbound versusbound) or (site one versus site two bound) or a multitude of suchnucleic acid molecules can be used to encode and store information.

[0070] For example, the origin of products can be tagged at a molecularlevel. Thus, if a product is from factory A, site one of the flip-flopmay be unbound (e.g., state 0), while if a product is from factory B,site one of the flip-flop can be bound (e.g. state 1). A singleprotein/nucleic acid “flip-flop” thus stores 1 bit of information and,in this example, is able to indicate two different sites of origin. Ofcourse, multiple “flip-flops” can be combined to form “registers”encoding a virtually limitless amount of information.

[0071] Readout can be easily accomplished by any of a number of means.For example, in one embodiment, the flip-flop nucleic acid willterminate with overhangs comprising restriction sites and each end willcomprise a different restriction site (e.g., an EcORI overhang adjacentto binding site I and a HindIII overhang adjacent to binding site 2 asillustrated in FIG. 2). The flip-flop is then contacted with two“readout” molecules comprising a double stranded nucleic acid ending ineither an EcORI overhang or a HindIII overhang and a ligation reactionis performed. The ligase will be unable to react at the restriction siteadjacent to the blocked (bound) site because of the interferenceafforded by the binding protein (e.g., Fis). Conversely, the ligase willreact at the restriction site adjacent to the blocked (bound) sitethereby attaching the “readout” molecule having the matching restrictionsite. Thus, where binding site one is bound, the readout molecule willattach adjacent to binding site two and where binding site two is bound,the readout molecule will attach adjacent to binding site one.

[0072] The readout molecule can be detected by any of a wide variety ofmeans. For example, the two readout molecules can be labeled withdistinguishable labels (e.g. fluorescent molecules of different colors).The readout molecules can optionally include a primer site to facilitatePCR amplification of a particular nucleic acid sequence that will onlybe amplified when the readout molecule is successfully ligated.

[0073] The nucleic acid of each the flip-flop can also encode a uniqueidentifier indicating which bit in the register or message isrepresented by that flip-flop. The PCR reaction can thereforesimultaneously reveal both the identity or address of the bit and itsstate. In another embodiment, the readout sequence can optionallyprovide a hybridization target for capture of the bit on a solid support(e.g., in a well of a microtiter or PCR plate).

[0074] The flip-flop can be provided free in solution or it can beanchored to a solid support (e.g., via a biotin/streptavidin reaction).When anchored, the anchor can be situated so that both ends of thenucleic acid are free (e.g., for the use of two readout molecules) or,it can be anchored through one end.

[0075] Where one end of the flip flop is anchored, readout can beaccomplished with a single readout molecule having a restriction sitecomplementary to the free end. Successful ligation will indicate thatthe binding site adjacent to the free end is unbound, while ligationfailure will indicate that the binding site adjacent to the free end isoccupied (bound).

[0076] In another embodiment, the memory can simply consist of a nucleicacid having a single protein binding site and a single protein. In thisembodiment, the bound nucleic acid will indicate one state, while theunbound nucleic acid will indicate the other state. This memory can beread by a number of means as described herein. Again, in a preferredembodiment, readout can be accomplished by a ligation reaction. In thiscase the nucleic acid molecule can comprise a single restriction siteadjacent to the protein binding site. The molecule is contacted with anucleic acid having the complementary restriction site overhang and aligation reaction is run as illustrated in FIG. 2. When the protein isbound, the ligase cannot react with the restriction site and no ligationreaction occurs. Conversely, when the binding site is unbound, theligation can occur and the attached readout molecule can then bedetected.

[0077] Once the state of the flip-flop molecular memory is set, thestate can be locked-in by cross-linking the nucleic acid binding proteinto the nucleic acid. Many cross-linkers suitable for such immobilizationare known and include, but are not limited to agents such asgluteraldehyde, avidin-biotin, and the like. The flip-flop thus providesa “write once read many” (WORM) memory that is extremely stable tovariations in environmental conditions.

[0078] As suggested above, almost any kind of information can be thusencoded into a series of “flip-flops” and read out at a later time. Suchcombinations of flip-flops provide messages at the molecular level thatcan provide useful information (e.g. point of manufacture of controlledsubstances such as drugs or explosives, unique identifiers orauthenticators e.g., currency, documents, etc., and the like).

[0079] Where a binding protein having high affinity and stability (e.g.,Tus) is used, the message will be relatively stable and if crosslinked,highly stable, to extreme environmental conditions. The message willalso be extremely difficult, if not impossible, to detect by casualobservation and will require use of the appropriate assay (e.g. ligasereaction with the correct readout molecules) for detection and/orreadout.

[0080] It will be noted that, in a preferred embodiment, the readoutmolecule will be designed so that it contains no protein binding site(s)other than those desired. This can be routinely accomplished with theuse of “sequence walkers” as described by Schneider (1997) Nucleic AcidsRes., 25-4408-4415.

[0081] B) Molecular Computing

[0082] 1) Combinatorial Tasks

[0083] In digital systems, digital outputs are often generated fromdigital inputs. For instance, an adder might take two 16 bit numbers asinputs and generate a 16-bit (plus carry) sum. Alternatively, a systemmight multiply two numbers. Another task might be to compare two numbersto see which is larger, or to compare a set of inputs with a desiredinput to make sure that the systems are equivalent. In anotherembodiment it might be desirable to attach a “parity bit” to a number tomake the total number of 1's even, say before transmission over a datalink. Then the parity could be checked on receipt (e.g., subsequentanalysis) as a simple check of correct transmission. All of these aretasks in which the output or outputs are predetermined functions of theinput or inputs. As a class, they are known as combinatorial tasks. Theycan all be performed with devices called gates, which perform theoperations of Boolean algebra applied to two-state (binary) systems.

[0084] The term “gate” as used herein refers to a device that returns anoutput that is a function of one or more inputs. Both the output and theinput are HIGH or LOW signal(s), and in the molecular gates of thisinvention the signals are carried (indicated) by signal proteins, whichare, in a preferred embodiment, nucleic acid binding proteins that bindto particular nucleic acid sequences (binding sites).

[0085] In the molecular computers and controls of this invention, thetwo (binary) signal states are represented by either a protein bound toa nucleic acid at a particular protein binding site (the signal statethen being referred to as HIGH in analogy to electronic systems) or abinding site being unbound (referred to herein as LOW in analogy toelectronic digital systems).

[0086] It will be appreciated that the state of the various gates can beread and thus provide information (e.g., the result of a computationalstep) or can be used to control a process. In the latter embodiment, theoutput state need not be read directly, but can simply result in theupregulation (e.g., where the output signal protein is a transcriptionfactor/enhancer) or downregulation (e.g. where the output signal proteinis a repressor) of one or more genes. The gates can also be stacked sothat the output of one gate acts as an input for another gate (e.g. onegate activates transcription of a protein binding molecule (signalmolecule) that can act as an input for the next gate).

[0087] Gates are well known to those of skill in the art. Basic gatesinclude an AND gate, an OR gate, and an Inverter (the NOT function).Other gates include the NOR(NOT OR), the NAND (NOT AND), the exclusiveOR (XOR), and so forth. A detailed description of gates can be found forexample, in Horowitz and Hill (1990) The Art of Electronics, CambridgeUniversity Press, Cambridge.

[0088] The construction and use of the NOR gate is described below. Itis generally appreciated that the NOR gate can be used to construct allof the other types of gates and thus is sufficient to provide afunctional computer. The use of the NOR-gate to construct NOT, OR, AND,and NAND gates is described below. Using the principles outlined herein,other gates can be designed at will.

[0089] a) NOR Gate

[0090] The output of a NOR gate is HIGH (able to bind a protein) onlywhen both inputs are LOW (unbound). This can be expressed in a “truthtable” as shown in Table 1. In the truth tables shown herein, inputs andoutputs refer to particular preselected binding sites on an underlyingnucleic acid. The inputs are viewed as HIGH when bound by a nucleic acidbinding protein (e.g., Fis, LacI, lambda cI, lambda cro, LexA, TrpR,ArgR, AraC, CRP, FNR, OxyR, IHF, GalR, MalT, LRP, SoxR, SoxS, sigmafactors, chi, T4 MotA, P1 RepA, p53, NF-kappa-B, ribosomes, T4 regA,spliceosomes (donor and acceptor), polyA binding factor, and the like)also referred to herein as a “signal protein” and as LOW when they arenot so bound. The outputs are viewed as HIGH when they are bound orcapable of being bound by a nucleic acid binding protein. (A site thatis capable of being bound by a polypeptide can be read out as HIGH byproviding the signal protein under circumstances where the binding willoccur if the site state is HIGH and then detecting the bound protein.)Conversely the outputs are viewed as LOW when they are not or cannot bebound by a nucleic acid binding protein (i.e., the protein thattypically recognizes and binds to that biding site). A “1” in the truthtables shown herein represents a HIGH state, while a zero represents aLOW state. TABLE 1 The truth table of a NOR gate. Input 1 Input 2 OutputI₁ I₂ O₁ 0 0 1 0 1 0 1 0 0 1 1 0

[0091] As illustrated in Table 1, the NOR gate output is HIGH only whenboth inputs are low. If there are more than two inputs, the NOR outputgate is HIGH only when all of the inputs are low. If any input is setHIGH, the output of the NOR gate is LOW.

[0092] One example of a molecular NOR gate of this invention isillustrated in FIG. 3. A preferred molecular NOR gate comprises anucleic acid sequence having at least three protein binding sites. Twoperipheral “input” binding sites (designated I₁ and I₂ in FIG. 3)bracketing an “output” binding site (designated O₁ in FIG. 3). Theprotein binding sites are spaced so that when either input site (I₁ orI₂) is bound, i.e., by a signal protein, the bound protein prevents anucleic acid binding protein (e.g., a second signal protein) frombinding the central “output” (O₁) site.

[0093] Under these circumstances, the conditions of Table 1 are met. Ifeither input protein binding site is bound with a protein, the outputprotein binding site cannot be bound. The only condition when the outputis HIGH (the output site can be bound) is when both inputs are LOW(unbound).

[0094] It will be appreciated that in this, and other gates describedherein, the protein binding sites can be selected to bind the same ordifferent proteins. However, where more than one site binds (isrecognized by) the same protein, there preferably exists a “selector”mechanism that allows the sites (e.g. the two inputs) to bedistinguished by the binding protein. In a preferred embodiment, theselector can be a second molecule (e.g. a DNA binding protein that, whenbound blocks the binding of a protein to the respective input or outputbinding sites). Where the selector molecules differ at each site, thesignal molecules can be directed to the particular location simply byuse of the appropriate selectors.

[0095] In one embodiment, each input specifically binds a differentbinding protein and the output binds a third different binding protein.Alternatively, the two inputs can bind the same type of binding proteinwhile the output binds a different protein. In this instance only twoselectors may be required, e.g. one selector at each input to specifywhether I₁ or I₂ (or I₃ . . . if there are more inputs) is bound. Thereadout binding protein can then be added after the input binding stepand no selector is required for the output binding site. From theforegoing explanation, other combinations of the same or differentbinding proteins and/or selectors will be routinely determined by one ofordinary skill in the art. It will be noted that in one preferredembodiment, the input and/or output DNA binding proteins include, butare not limited to LacI, lambda cI, lambda cro, LexA, TrpR, ArgR, AraC,CRP, FNR, OxyR, IHF, GalR, MalT, LRP, SoxR, SoxS, sigma factors, chi, T4MotA, P1 RepA, p53, NF-kappa-B, while preferred input and/or output RNAbinding proteins include, but are not limited to ribosomes, T4 regA,spliceosomes (donor and acceptor), polyA binding factor, and the like.

[0096] It will be appreciated that these and other nucleic acid bindingproteins can also act as selectors. Alternatively, the selectors can berestriction endonucleases modified so that they bind, but do not cut thebound nucleic acid. The selection and/or design of DNA binding proteinsand/or selectors is described below in section II.

[0097] b) Coupling the NOR Gate to Another Gate.

[0098] As indicated above, in the design of various gates and moreelaborate molecular computing circuits it is often desirable to couplethe output of one gate to the input of another gate. More particularly,the output of one gate acts as the input to one or more other gates.

[0099] For example, the output of a NOR gate can act as the input of aNOR gate to produce an OR gate. In this case, the output (O₁) producedby two inputs (I₁ and I₂) is represented algebraically as:

O₁=OR(I₁,I₂)=NOT(NOR(I₁,I₂))

[0100] Coupling the output of one gate (or flip-flop) to the input ofanother gate (or flip-flop) can be accomplished by a number of means.For example, in one embodiment, a single binding site can act as boththe output for one gate and the input for another gate. However, in apreferred embodiment, it is generally preferred that the output of onegate or flip-flop and the input of another gate or flip-flop comprisedifferent binding sites. In this instance, the logic elements (gates orflip-flips) can regulate expression of a signaling molecule (bindingprotein) that acts as an input into one or more logic elements.

[0101] A gate regulating expression of a binding protein is illustratedin FIG. 3. While this figure illustrates a NOR gate regulating geneexpression this can be accomplished with essentially any gate.

[0102] As shown in FIG. 3, readout of the NOR gate is accomplished bycontacting the output binding site (O₁ in FIG. 3) with a tetheredactivator. The tethered activator comprises a binding protein capable ofspecifically binding to the output binding site (e.g., O₁) attached(directly or through a linker) to a gene activator (e.g., Gal4, see,e.g., Ptashne (1985) Cell 43(3): 729-736). When the output binding siteis set HIGH (e.g. by both inputs being set LOW in the NOR gateillustrated in FIG. 3), the binding protein of the tethered activatorbinds to the output binding site. This anchors the activator which thenactivates transcription of a gene under the activator's control.

[0103] The gene encodes a binding protein (signaling molecule) that,once expressed, can bind to the input binding sites of other logicelements (e.g., gates or flip-flops) thereby setting the input(s) HIGH.Where the output site is set LOW, the tethered activator cannot bind andno transcription occurs. No signaling protein is expressed and theinput(s) of “downstream” logic elements stays LOW. Thus, the logicelement regulated expression of a signaling molecule (binding protein)couples the output of one logic element with the input of other logicelements.

[0104] It is well known that the anchoring of a tethered activator canexpress gene activation. This was first demonstrated by Ptashne (1985)Cell 43(3): 729-736, who showed that GAL4, a Saccharomyces cerevisiaetranscriptional activator attached to a LexA, an Escherichia colirepressor protein could activate transcription of a gene if and only ifa lexA operator (a binding site to anchor the construct) was presentnear the transcription start site. A number of other tetheredactivator-binding protein constructs have similarly been shown toactivate transcription (see, e.g., Silverman et al. (1993) Proc. Natl.Acad. Sci. USA, 91: 11665-11668, Chrivia et al. (1993) Nature, 365:855-859, and Pfisterer et al. (1995) Biol. Chem., 270(50): 29870-29880).

[0105] The coupling can also be run with the opposite “sign”. In thisembodiment, the gene that encodes the binding protein can beconstitutive active. The output of the logic element can then be boundby a tethered repressor that when bound switches off gene expression. Asthe binding protein is cleared from the system, the formerly bound(HIGH) input sites will reset to a LOW state. However, in a preferredembodiment, logic element coupling is accomplished with the use ofactivators to avoid inadvertent and undesired suppression of the signalprotein by free repressor in the solution.

[0106] c) Inverter (Not) Function.

[0107] A second important combinatorial logic function is the inverteror NOT function. The NOT function returns the complement of a logiclevel. The not function is illustrated by the truth table of Table 2.TABLE 2 Truth table of the NOT (inverter) function Input 1 Output I₁ O₁0 1 1 0

[0108] A NOT function returns a LOW signal state when the input is HIGHand a HIGH signal state when the input is LOW. In the context of amolecular inverter, binding a protein to an input (thereby setting theinput HIGH) prevents binding of a protein to the output binding sitethereby setting the output LOW).

[0109] Inspection of Table 1 reveals that a NOR gate in which bothinputs are equal becomes a NOT gate (inverter function). Thus, whereboth inputs are set HIGH, the output of a NOR gate is LOW and,conversely, where both inputs are set LOW, the output of the NOR gate isHIGH.

[0110] One embodiment of the NOT gate is illustrated by Gate B in FIG.3. In this gate, both inputs are the same binding site and no selectorsare used to control which of the two sites is specifically bound.

[0111] When an input binding protein (BP3 in FIG. 3) is present eitheror both I₃ and I₄ are bound and set HIGH. The output site (O₂) isblocked and thereby set LOW. Conversely, when the input binding protein(BP3) is absent, both I₃ and I₄ are unbound and therefore set LOW. Theoutput site (O₂) is unblocked and can therefore be bound. The output isthus set HIGH. Gate B thus conforms to the truth table illustrated inTable 2.

[0112] While the NOT gate is illustrated as an OR gate in which theinputs are set equal, it will be appreciated that one of the inputs canbe eliminated with essentially no change in function. For example, ifinput 3 (I₃) is eliminated, a LOW input 4 (I₄) will still result in aHIGH output and vice versa. Indeed, both inputs can be eliminated and asingle site can be viewed as self-NOT. However, in this instance itbecomes difficult to distinguish input from output unless the signal andreadout steps are clearly distinguished (e.g., performed at separatetimes or different input and readout molecules are used).

[0113] In a preferred embodiment, the NOT gate is a NOR gate having twoinput binding sites (I₁ and I₂) that are identical. The use of two inputsites increases the likelihood of detecting an input signal whensignaling protein concentration is low.

[0114] d) OR Gate

[0115] A NOR gate is essentially an inverted OR gate. The converse isalso true.

[0116] Thus, passing the output of a NOR gate through a NOT gate givesOR. Algebraically this may be designated as:

O₁=NOT(NOR(I₁,I₂))=OR(I₁,I₂)

[0117] An OR gate is characterized by the truth table illustrated inTable 3. TABLE 3 Truth table of an OR gate. Input 1 Input 2 Output I₁ I₂O₁ 0 0 0 0 1 1 1 0 1 1 1 1

[0118] Generally an OR gate produces a HIGH output (the output proteinbinding site is available for binding or bound by a signal protein) whenany or all inputs are HIGH (binding sites are bound).

[0119] A molecular OR gate is illustrated in FIG. 3 which shows a NORgate feeding an output (BP3) into a NOT gate. The output of the NOT gate(O₂) is a NOR of the inputs I1 and I2 into the OR gate (gate A). The NORgate regulates the expression of a gene as described above. The geneencodes a binding protein (BP3 in FIG. 3) that specifically binds toeither of the two inputs (BS3 in FIG. 3) of a NOT gate. When either orboth inputs of the first NOR gate (gate A) are set high, the activatoris not bound to the output and gene activation does not occur. The inputof the NOT gate (Gate B) is set LOW and the output is thereby set HIGH.Conversely, when both inputs of the NOR gate are unbound (LOW), thetransactivator (A) can be anchored at O1 resulting in the activation ofthe gene encoding a binding protein (BP3 in FIG. 3). The binding proteinsets the input(s) of the NOT gate high resulting in a LOW output at O₂.Thus the only condition in which O₂ is LOW is when neither input (I₁, orI₂) is HIGH. This conforms with the truth table illustrated in Table 3.

[0120] As indicated above, one or both of the inputs of the NOT function(Gate B in FIG. 3) can optionally be eliminated. Particularly, where theNOT gate provides input to another gate or function by anchoring asecond transactivator, both inputs can be eliminated.

[0121] Then, when both inputs to Gate A (I₁ and I₂) are set LOW, thebinding protein BP3 will be expressed, bind to the single site (O₂) andthereby prevent anchoring of the second transactivator.

[0122] e) AND Gate

[0123] The output of an AND gate is HIGH (able to bind a protein) onlywhen both inputs are HIGH. An AND gate can be constructed from NOR gatesas:

AND(I₁,I₂)=NOR(NOR(I₁,I₁), NOR(I₂,I₂))

[0124] This can be expressed in a “truth table” as shown in Table 4.Again, as described above, inputs and outputs refer to particularpreselected protein binding sites. TABLE 4 The truth table of an ANDgate constructed of NOR gates. Output Input Input O₁ I₁ I₂ X = NOR (I₁,I₁₎ Y = NOR(I₂, I₂₎ NOR(X, Y) 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1

[0125] One AND gate of this invention is illustrated in FIG. 4. The ANDgate consists of three NOR gates (described above). The first two NORgates (Gates A and B in FIG. 4) accept inputs I₁ and I₂ respectively. Itwill be noted that two input binding sites in each NOR gate are the sameso that in effect the NOR gates act as NOT functions. When the inputs ofthe NOR gates are set LOW, a binding protein (BP1 in FIG. 4) binds tothe output binding site(s) (O₁ and/or O₂) thereby anchoring atransactivator that activates transcription of binding proteins fromeach of the OR gates (BP2 and/or BP3 in FIG. 4). The binding proteinsthen act as inputs into the third gate (gate C). The only conditionunder which the output (O₃) of the third NOR gate is high is when bothI₁ and I₂ of gate A are HIGH resulting in no transcription of BP4 orBP3. This satisfies the conditions of Table 4.

[0126] While the AND gate is illustrated as a series of NOR gates inFIG. 4, since both gates A and B are actually NOT functions, in someembodiments, one or both inputs can be eliminated. Thus, the AND gatecan be reduced to a collection of five binding sites as illustrated inFIG. 5a. However, in this instance, it is preferable to use differentbinding sites for the inputs so as to distinguish them.

[0127] Another simplified AND gate is illustrated in FIG. 5b. This ANDgate consists of five protein binding sites as well. Each pair ofadjacent binding sites is capable of acting as a flip-flop (i.e. theconstituent sites in a pair are mutually exclusive). When either, orboth, I₁ and I₂ is unbound (LOW) a binding protein can bind to site(s)BS2. When either BS2 site is bound, site BS3 is blocked and O₁ is thusset LOW. The only condition under which BS3 can be set high is when bothI₁ and I₂ are bound. Then no protein binds at BS2, and BS3 (O₁) isunblocked (HIGH).

[0128] f) NAND Gate

[0129] The output of a NAND (NOT AND) is shown in Table 5. The NAND gateis essentially an inverted AND gate. This gate produces a LOW outputonly when both inputs are set HIGH. TABLE 5 The truth table of a NANDgate. Input 1 Input 2 Output I₁ I₂ O₁ 0 0 1 0 1 1 1 0 1 1 1 0

[0130] A molecular NAND gate of this invention is illustrated in FIG. 4.As explained above, gates A, B, and C form an AND gate. The output ofthis AND gate (BP5) is run through a NOT gate (gate D) which inverts thesignal producing a NAND gate at output O₄ in response to inputs I₁ andI₂.

[0131] g) Other Gates

[0132] Using the principles described above, virtually any type of gatecan be constructed from the proper combination of NOR and NOT. Theidentities of various gates are well known to those of skill in the artand can also be determined from first principles of boolean algebra.Various gates are illustrated in Horowitz and Hill, supra. as well as innumerous other references pertaining to digital circuitry.

[0133] While the NOR and NOT gates comprising the various constructs inFIGS. 3, 4, and 5 are illustrated as separate nucleic acids, it will beappreciated that the various elements, gates, and even complex digitalcircuits can exist on and be encoded by a single nucleic acid.

[0134] It is also noted that the same species of tethered transactivator(i.e., same binding protein, activator protein combinations) can couplenumerous different gates. By placing this activator under logic control,entire circuits can be controlled by a single input.

[0135] 2) Sequential Logic: The Basic Flip-Flop.

[0136] In combinatorial circuits, the output is determined completely bythe existing state of the inputs. There is no “memory”, no history inthese systems. In contrast, in sequential logic systems the output isnot determined entirely by the input, but is also affected by thehistory of the system.

[0137] When devices with “memory” are added to the system, it becomespossible to construct counters, accumulators, and other functions thathave a historical element.

[0138] The basic unit of storage is the “flip-flop”. Generally speakinga flip-flop is a device (or system) that has two stable states; it issaid to be “bistable”. Which state the flip-flop is in depends on itspast history.

[0139] a) The basic Flip-Flops and Storage Registers.

[0140] A basic flip-flop of this invention is illustrated in FIGS. 1aand 1 b. A nucleic acid is provided having two protein binding sitespositioned such that they cannot simultaneously be occupied by a nucleicacid binding protein. Thus, if a first site (e.g., BS1 in FIG. 1a) isbound (HIGH), the second site (BS2 in FIG. 1a) is unbound (LOW);conversely, if the second site is bound (HIGH), the first site isunbound (LOW). The flip-flop thus has two stable states: the first siteHIGH and the second site LOW or the second site HIGH and the first siteLOW.

[0141] Binding of the signal polypeptide (BP1 or BP2 in FIGS. 1a and 1b) can be reversible or irreversible. Where the binding is irreversible,the flip-flop acts as a read only storage device (read only memory ROM)and each flip-flop stores one bit of information. As explained above,the flip-flops can be combined (e.g., to form storage registers) and canultimately encode vast amounts of information. Such registers include atleast two and may include even more flip-flops (e.g., up to 3, 4, 5, 6,. . . 4096 or even more flip-flops).

[0142] One particularly preferred flip-flop comprises a deoxyribonucleicacid (DNA) having two Fis binding sites. The sites are spaced apart byless than 23 base pairs (bp), preferably less than about 20 bp, mostpreferably less than about 15 bp and most preferably less than about 12bp. In one most preferred embodiment, the Fis binding sites are 7 or 11base pairs apart (see e.g., Example 1) but it will be appreciated thatthe sites can fully overlap and can be separated by a spacing of 1-11base pairs. As illustrated in FIG. 8, spacing (expressed in base pairs)refers to the shift in base pairs from fully overlapping sites. Thus aspacing of 0 means the two sites fully overlap. A spacing of 7 bp meansthat the binding sites are displaced relative to each other by 7 basepairs. In the case of the Fis binding sites of FIG. 8, where the bindingsite is 21 bp in length, when the sites are spaced apart by 7 bp, therestill exists a 14 bp overlap (see, e.g, FIG. 8).

[0143] In a preferred embodiment, the Fis binding sites are selected toprovide a binding strength of at least about 0 bits, preferably at leastabout 1 bit, more preferably at least about 2 bits and most preferablyat least about 2.4 bits as determined by individual information theory(see, Hengen et al. (1997) supra.). In one preferred embodiment, one ormore of the binding sites in a molecular flip-flop or gate of thisinvention is a Fis binding site having the sequence (TTTG(G/C)TCAAAATTTGA(G/C)CAAA, SEQ ID NO: 1). The binding sites are preferably spaced from7 to about 11 base pairs apart and more preferably are spaced at 7 or 11base pairs apart (see, e.g., Example 1). Particularly preferredpaired-binding sites include, but are not limited to, 11 bp spacing(e.g., 5′-TATTCTTTGCTCAA AATTTGATCAAATTTTGAGCAAAGAATA-3′, SEQ ID No: 2)and 7 bp spacing (e.g., 5′-AGGCTTTTGCTCAAAGTTTAAACTTTGAGCAAAAGCCT-3′,SEQ ID NO: 3) whose walker maps are illustrated in FIG. 8. Particularlypreferred Fis-based flip-flops are illustrated in the Examples.

[0144] b) Setting the State of the Flip-Flop.

[0145] The state of the flip-flop is set by binding a protein to eitherthe first or to the second binding site (BS1 or BS2 in FIGS. 1a and 1b). The state can be set randomly or alternatively, the flip-flop can beset to a particular preselected state (i.e., it is predetermined whetherthe first or second binding site will be occupied). Where the twobinding sites bind to characteristically different nucleic acid bindingproteins (e.g. Fis binds at the first site and CRP binds at the secondsite), the state of the flip-flop can be set by providing one of the twobinding proteins. Alternatively, the flip-flop can be used to read therelative abundance of the two proteins. To the extent one bindingprotein is present in greater concentration than the other, thepredominance of flip-flops in one state as opposed to the other willindicate the relative abundance of the proteins. The flip-flop may beused alone to achieve such a readout or may be used in conjunction withone or more other flip-flops and/or gates to provide such a readout asdescribed below.

[0146] Alternatively, both binding sites of the flip-flop may bind tothe same binding protein (e.g. Fis). In this case, setting the flip-flopto a particular predetermined state may require the use of a selector.

[0147] A selector is a moiety that prevents binding of the bindingprotein to a particular binding site. Thus, for example in FIG. 1b, 1 fa selector is bound to selector binding site S₁, then binding site BS1cannot be occupied and the flip-flop is set with site BS2 HIGH.Conversely if the selector S₁ in FIG. 1b is occupied the binding proteincan only bind to site BS1 and BS1 is then set HIGH.

[0148] Selectors are discussed below. However, at this point it is notedthat the selector can be a binding protein or any other moiety thatselectively blocks the binding site with which it is associated. Thus,the selector can be a modified restriction endonuclease (e.g., EcORI)that binds to, but does not cleave the underlying nucleic acid (see,e.g., see, King et al. (1989) J. Biol. Chem. 264(20): 11807-11815). Theselector could also be a chemical that selectively modifies theunderlying nucleic acid (e.g., base modification, thymidinedimerization, etc.) to prevent attachment of the binding protein. Otherpossible selectors include nucleic acids (e.g., antisense molecules)peptide nucleic acids, streptavidin/biotin, and the like.

[0149] c) Resetting the State of the Flip-Flop.

[0150] As indicated above, the flip-flop can use proteins that bindnucleic acids irreversibly and thus act as a read-only memory component.Alternatively, binding proteins can be used that can be released fromthe flip-flop. Thus the state of the flip-flop can be set by binding abinding protein to one site and then reset by releasing that protein(and preferably binding a protein to the other site).

[0151] Nucleic acid binding proteins that bind reversibly are known tothose of skill in the art and include, but are not limited to LacI,lambda cI, lambda cro, LexA, TrpR, ArgR, AraC, CRP, FNR, OxyR, IHF,GalR, MalT, LRP, SoxR, SoxS, sigma factors, chi, T4 MotA, P1 RepA, p53,NT-kappa-B, ribosomes, T4 regA, spliceosomes (donor and acceptor), polyAbinding factor, and the like.

[0152] In addition, non-naturally occurring binding proteins can beobtained by the mutation and selection of natural binding proteins or byde novo synthesis. The identification and preparation of suitablenucleic acid binding proteins is described below in Section II.

[0153] 3) Combinations of Gates to Form Logic Circuits.

[0154] It will be appreciated that logic gates and flip-flops of thisinvention can be combined in a wide variety of ways to process signals.This involves coupling the output of one gate to an input of anothergate. This is illustrated above, where the output of the NOR gate iscoupled to (provides) the input of the inverter to produce an OR gate.Similarly, the output of the AND gate is coupled to an inverter toproduce a NAND gate. Of course more than two gates may be coupled into acircuit and the coupling may be to gates other than an inverter. Indeedvirtually any type of gate can be coupled to any other type of gate.

[0155] Using the gates illustrated herein, numerous other logicfunctions will be apparent to those of skill in the art. Moreover,extending the principles of gate formation and coupling described above,various combinations of gates and/or flip-flops can be combined toproduce complex computational signal and/or control circuits. Oneexample of such a circuit is the use of gates to produce a simple adder.The adder circuit is well known and described, for example, by Gonick(1983) The Cartoon Guide to Computer Science, Barnes & Noble Books, NewYork, N.Y.

[0156] Briefly, components of a one bit adder can be made from XOR andAND gates and a carry bit. When the carry is 0, the SUM is the XOR ofthe addends, while the CARRY is the AND of the addends. Thus, where c isa previous carry, a is one addend and b is the other addend the addermeets the conditions of Table 6. TABLE 6 Adder logic in which carry bitis 0. Sum Carry c a b a XOR b a AND b Value 0 0 0 0 0 0 + 0 = 0 0 0 1 10 0 + 1 = 1 0 1 0 1 0 1 + 0 = 0 0 1 1 0 1 1 + 1 = 10 (i.e., “2)

[0157] As evidenced by Table 6, this is indeed the sum and carry Whenthe input carry is 1, the table changes a little as shown in Table 7.TABLE 7 Adder logic in which carry bit is 1. Sum Carry c a b NOT(a XORb) a OR b Value 1 0 0 1 0 1 + 0 + 0 = 1 1 0 1 0 1 1 + 0 + 1 = 10 (“2”) 11 0 0 1 1 + 1 + 0 = 10 (“2”) 1 1 1 1 1 1 + 1 + 1 = 11 (“3”)

[0158] If there is a carry from the previous one bit adder, the sum isNOT(XOR(a,b)) and the CARRY is OR(a,b). The input carry c can be used toselect these states and make equations: SUM = (NOT(c) AND XOR(a,b))   or  (  c AND NOT(XOR(a,b))) CARRY = (NOT(c) AND AND(a,b))   or   (  c ANDOR(a,b))

[0159] As indicated above other logic functions can be produced usingsimilar approaches. Signal processing and control circuits comprising amultiplicity of coupled gates are well known to those of skill in theart (see, Horowitz and Hill, supra.).

[0160] While the NOR and NOT gates comprising the various constructs inFIGS. 3, 4, and 5 are illustrated as separate nucleic acids, it will beappreciated that the various elements, gates, and even complex digitalcircuits can exist on and be encoded by a single nucleic acid. Thecombination of flip-flops and NOR gates with activators leading totranscription of the next logic level means that individual yeast (orother) cells could be used to perform molecular logic.

[0161] It will also be appreciated that these circuits are essentiallyequivalent to their digital electronic counterpart. These circuits,however exist on the molecular level. Moreover, they can be produced inenormous numbers (e.g., through simple expression from appropriatevectors) and thus are amenable to enormous parallelism and are thereforewell adapted to the solution of certain classes of complex computationalproblems.

[0162] It is noted, for example, that nucleic acid constructs have beenused to solve a directed Hamiltonian path problem (Adleman (1994)Science, 266: 1021-1024). Moreover, following Schneider (1991) J.Theoret. Biol., 148: 125, Adleman concluded that molecular computationapproaches theoretical efficiencies as much as 10 orders of magnitudegreater than conventional supercomputers.

[0163] 4) Signal Readout.

[0164] A wide variety of means can be used to read the status of one ormore gate outputs or flip-flops. As indicated above, a gate output isread as HIGH where it is bound or capable of being bound by a signalprotein. Thus, in a preferred embodiment, the output status can bedetermined simply assuring that the output site is contacted with anappropriate signal protein under conditions in which, if the site isHIGH and unbound, the signal protein can bind, and then determining thepresence or absence of a bound signal protein. Methods of identifyingthe binding of proteins to nucleic acids are well known to those ofskill in the art. In one embodiment, this can be accomplished by gelshift assays as described herein in the Examples (see, also, Lane et al.(1992) Microbiol. Rev. 56(4): 509-528 and Garner et al. (1981) Nucl.Acids Res., 9(13): 3047-3060).

[0165] However, the binding can be more easily assayed by detecting thebinding of a labeled signal molecule. Means of labeling proteins arewell known to those of skill (see, for example, Chapter 4 in MonoclonalAntibodies: Principles and Applications, Birch and Lennox, eds. JohnWiley & Sons, Inc. N.Y. (1995) which describes conjugation of antibodiesto labels and other moieties). Proteins contain a variety of functionalgroups; e.g., carboxylic acid (COOH) or free amine (—NH₂) groups, whichare available for reaction with a suitable functional group on eitherthe label or on a linker attached to the label.

[0166] Detectable labels suitable for use in the present inventioninclude any composition detectable by spectroscopic, photochemical,biochemical, immunochemical, electrical, optical, genetic or chemicalmeans. Useful labels in the present invention include biotin forstaining with labeled streptavidin conjugate, magnetic beads (e.g.,Dynabeads™), fluorescent dyes (e.g., fluorescein, texas red, rhodamine,green fluorescent protein, and the like, see, e.g., Molecular Probes,Eugene, Oreg., USA), radiolabels (e.g., ³H, ¹²⁵I, ³⁵S, ¹⁴C, or ³²P),enzymes (e.g., horse radish peroxidase, alkaline phosphatase and otherscommonly used in an ELISA), selectable markers (e.g., antibioticresistance genes), and colorimetric labels such as colloidal gold (e.g.,gold particles in the 40-80 nm diameter size range scatter green lightwith high efficiency) or colored glass or plastic (e.g., polystyrene,polypropylene, latex, etc.) beads. Patents teaching the use of suchlabels include U.S. Pat. Nos. 3,817,837; 3,850,752; 3,939,350,3,996,345; 4,277,437; 4,275,149; and 4,366,241.

[0167] In another embodiment, the binding of a protein to a particularsite on a nucleic acid can be determined by changes in fluorescence of afluorophore attached either to the binding protein or to the nucleicacid caused by energy transfer between the fluorophore and a quenchermolecule or a second fluorophore (e.g., a fluorescence resonance energytransfer system) on protein binding. Thus, for example, a lumazinederivative has been used in conjunction with abathophenanthroline-ruthenium complex as an energy transfer system inwhich the lumazine derivative acted as an energy donor and the rutheniumcomplex acted as an energy receptor. The lumazine derivative andruthenium complex were attached to different nucleic acids. Energytransfer occurred when the two compounds were brought into proximityresulting in fluorescence. The system provided a mechanism for studyingthe interaction of molecules bearing the two groups (see, e.g.,Bannwarth et al. (1991), Helvetica Chimica Acta. 74: 1991-1999,Bannwarth et al. (1991), Helvetica Chimica Acta. 74: 2000-2007, andBannwarth et al., European Patent Application No. 0439036A2).

[0168] Such resonance energy transfer systems are easily adapted todetect protein nucleic acid interactions as well. This involves placingthe fluorophore or quencher near the particular binding site it isdesired to “read” and then detecting the change in fluorescence as therespectively labeled protein binds to that site. Alternatively, thebinding protein can carry the flurophore while the DNA carries thequencher or vice versa, and the intensity of fluorescence will thenindicate the state of the flip-flop. Other energy transfer systems arewell known to those of skill in the art (see, e.g., Tyagi et al. (1996)Nature Biotechnology, 14: 303-308). Such systems can readily distinguishbetween the two stable states of a flip-flop as well simply by providingdifferent quenchers or different fluorophores at each binding site.

[0169] As suggested above, the state of a flip-flop or gate can belocked prior to reading by covalently attaching the nucleic acid bindingprotein (if present) to the underlying nucleic. This can be accomplishedby the use of cross-linkers. Protein and nucleic acid cross-linkers arewell known to those of skill in the art and include, but are not limitedto gluteraldehyde, disuccinimidyl suberate (DSS) (Pierce, Rockford,Ill., USA), active esters of N-ethylmaleimide (see, e.g., Lerner et al.(1981) Proc. Natl. Acad. Sci. USA. 78: 3403-3407 and Kitagawa et al.(1976) J. Biochem., 79: 233-236), and the like.

[0170] In another embodiment the state of the flip-flop and/or gate canbe read by the use of a surrogate marker (a readout molecule) that canoptionally preserve the state information even after the bindingprotein(s) is removed from the underlying nucleic acid. One example ofthe use of such a readout molecule is described above in the use ofligation reactions to attach a readout nucleic acid to one or the otherend of a flip-flop.

[0171] Another surrogate readout molecule utilizes an avidin(streptavidin)/biotin interaction. In this embodiment, a biotin isattached to the underlying nucleic acid (preferably via a linker). Thebiotin is located nearby one of the flip-flop binding sites or near theoutput site of a gate. After the state of the gate or flip-flop is set,the logic element is contacted with an avidin or streptavidin molecule.When the binding site is occupied by a nucleic acid binding protein(HIGH), the binding protein will block the interaction of thestreptavidin with the biotin. Conversely, when the binding site isunbound (LOW), the streptavidin can bind the biotin. The boundstreptavidin can be detected using standard methods (e.g., gel shiftassay or labeled streptavidin) and a bound streptavidin will indicate aLOW binding site state. The biotin/streptavidin interaction is extremelystable and the state can be read even after the nucleic acid bindingprotein is dissociated from the underlying nucleic acid. An illustrationof this readout system is provided in Example 2.

[0172] C) Complex (“digital”) Control of Gene Expression

[0173] In another embodiment, the signal protein need not be labeled atall in order to detect the state of the output site. For example, wherea tethered activator is used to bind to the output site (e.g. as shownin FIG. 3), binding of the tethered activator will induce activation ofthe appropriately located gene. Where the input signal serves to blockan output site (e.g. in the NOR gate), the tethered activator will failto bind and gene transcription will not occur. Detection of theexpression of a gene or genes under the control of these tetheredactivators (or repressors) then provides an indication of the status ofthe output. Thus, for example, where the tethered activator is atranscription activator, upregulation of gene transcription indicates aHIGH state at the output. Conversely, where the tethered constructprotein is a repressor, and the gene is constitutively activated, adecrease in transcription indicates a HIGH output state.

[0174] In this case, where it is desired simply to “read” the outputstate, the gene or genes under the control of the gate will typically bereporter gene(s). A reporter gene is a gene that that codes for aprotein whose activity is easily detected, allowing cells expressingsuch a marker to be readily identified. Such markers are well known tothose of skill in the art and include, but are not limited toglucuronidase, bacterial chloramphenicol acetyl transferase (CAT),beta-galactosidase (β-gal), various bacterial luciferase genes encodedby Vibrio harveyi, Vibrio fischeri, and Xenorhabdus luminescens, thefirefly luciferase gene FFlux, green fluorescent protein, and the like.

[0175] Selectable markers (e.g., antibiotic resistance genes) provideanother simple mechanism for reading out the state of a flip-flop. Inthis embodiment, the gene or one of the genes under control of a logicelement of this invention is a selectable marker. Cells containing thelogic cassette are grown under selection conditions (e.g., in thepresence of one or more antibiotics) and survival of the cells indicatesthe state of the logic element.

[0176] In another embodiment the logic gates of this invention can beused for expression control rather than computation. In this case, thereporter gene can be substituted with one or more genes whose expressionit is desired to regulate using the logic apparatus of this invention.Such genes include, but are not limited to the multidrug resistance gene(MDR1), e.g., to confer drug resistance on healthy cells duringchemotherapy, the p53 tumor suppressor gene, e.g. in certain breastcancers, the telomerase gene, and the like.

[0177] This invention thus provides means of regulating gene expressionin response to complex stimuli. A simple example, is illustrated in FIG.3 which shows a gene under the control of a NOR gate. The output of theNOR gate is “read” by a tethered activator having, at one end, a bindingprotein that specifically recognizes the output binding site and at theother end a gene activator (A). In this case when either input 1 (I₁) orinput 2 (I₂) is bound, the tethered construct cannot anchor to theoutput site (O₁) and the activator attached to the tethered constructfails to activate the gene. However, when both inputs are unbound, thetethered construct attaches to the output site (O₁) anchoring theactivator to the nucleic acid where it can then activate transcriptionof the gene.

[0178] The source of the two input (signal) proteins can be exogenous(preferably delivered to the cell by a transporter, e.g. lipsome orother vehicle) or heterologous. Alternatively, one or more of the signalprotein can be the product of endogenous metabolic pathways in the cellor they can either or both be the product of heterologous expressioncassettes which themselves can be simple traditional cassettes(inducible or constitutive) or can be “logic” cassettes having genesunder the expression of one or more logic gates and/or flip-flops ofthis invention. Similarly, the tethered activator(s) can be exogenouslysupplied, or alternatively, particularly where the linker joining thebinding protein and the activator is itself a polypeptide (therebyproviding a tethered activator that is a fusion protein), the tetheredactivator can be expressed as a heterologous polypeptide by anappropriate expression cassette.

[0179] In one embodiment, the regulated gene can itself be a nucleicacid encoding a binding protein or multiple genes can be expressed bythe “logic cassette” one or more of which is another signaling protein(nucleic acid binding protein) as described above. The expressed nucleicacid binding proteins can act as inputs to the logic cassette providingnegative or positive regulation of gene expression by that cassette.Alternatively, or in combination with such feedback regulation, theexpressed binding proteins can act as inputs into other logic cassettesthereby allowing cascade regulation of multiple logic cassettes andextremely complex regulation of the gene or genes expressed by the logiccassettes.

[0180] The logic-based expression control systems of this invention thusprovide a vast improvement in regulation of the expression ofheterologous genes (or cDNAs). In traditional heterologous geneexpression systems, gene expression is typically regulated by a singleinducer (e.g., IPTG). In contrast, expression of a gene (or cDNA) underthe control of one or more logic elements as described above can be theresult of a complex combination of stimuli including one or moreinducers, positive and negative feedback regulation, and the like.

[0181] D) Affinity Chromatography/Analyte Quantification.

[0182] In a somewhat more mundane, but highly useful application, theflip-flops of this invention can be used to efficiently quantify theamount of an analyte in a solution (e.g., in a biological sample such acell homogenate, blood, etc.). In this embodiment, both binding domainsof the flip-flop are identical and selected to specifically bind to theanalyte of interest (e.g., a nucleic acid binding protein (e.g., Fis) ora nucleic acid, etc.). The sites are still positioned however so thatbinding of the target analyte at one site excludes binding at anothersite.

[0183] The provision of two binding sites on each nucleic acid moleculeenhances the probability of binding, as compared to a system in whicheach nucleic acid bore a single binding site. Because the number ofbinding sites is doubled the likelihood of the target analyte findingand correctly orienting to an appropriate binding site is increased.However once bound, the second binding site becomes unavailable. Thus,the number of bound nucleic acids is equivalent to the number of boundtarget analytes. Quantification of the bound nucleic acid thus providesa direct measure of the quantity of bound target analyte and an indirectmeasure of the amount of analyte in the solution.

[0184] The amount of bound nucleic acid can be quantified by a number ofmeans well known to those of skill in the art. For example, anelectrophoretic gel can be used to separate the bound from the unboundnucleic acid and unbound analyte. The separated bound nucleic acid canthen be quantified e.g., an electrophoretic gel can be used to separatethe bound from the unbound nucleic acid and unbound analyte. Theseparated bound nucleic acid can then be quantified e.g., byquantification of a label that is attached to the nucleic acid. Thenucleic acid can be labeled by a number of means as described above.

[0185] Such assays are performed in a manner analogous to immunoassays,which also just detect and/or quantify binding of an antibody to atarget analyte. Formats for such binding assays are well known to thoseof skill in the art and include, but are not limited to competitiveassay formats, non-competitive assays formats, and other formats (see,e.g., U.S. Pat. Nos. 4,366,241; 4,376,110, 4,517,288; and 4,837,168).For a review of the general immunoassays and binding assay formats, seealso Methods in Cell Biology Volume 37: Antibodies in Cell Biology,Asai, ed. Academic Press, Inc. New York (1993); Basic and ClinicalImmunology 7th Edition, Stites & Terr, eds. (1991).

[0186] II. Assembly of Molecular Flip-Flops and Gates.

[0187] A) Binding Protein/Nucleic Acid Design/Selection

[0188] 1) Identification and Selection of Binding Sites.

[0189] Many nucleic acid binding proteins and their cognate bindingsites are suitable for the practice of this invention. Generally, asuitable protein will bind to a “substrate” nucleic acid (e.g., singleor double stranded, RNA, DNA, peptide nucleic acid, etc.) at a sitecharacterized by a particular nucleotide sequence. This site, designateda protein binding site, can vary in sequence and it will be appreciatedthat for any particular binding protein there may exist a number ofdifferent nucleotide binding sites that, while still specific for thebinding protein, bind that protein with different strengths (see, e.g.,Hengen et al. (1997) supra.).

[0190] The binding protein/binding site combinations to be used will bedetermined by consideration of a number of different factors. Theseinclude the number of different binding proteins it is desired to use inthe particular system, whether binding at a particular site is to bereversible or irreversible, and what binding site spacing is desirable.

[0191] A large number of suitable binding protein are known to those ofskill in the art. These include, but are not limited to Fis, LacI,lambda cI, lambda cro, LexA, TrpR, ArgR, AraC, CRP, FNR, OxyR, IHF, GalRMalT, LRP, SoxR, SoxS, sigma factors, chi, T4 MotA, P1 RepA, p53,NF-kappa-B, ribosomes, T4 regA, spliceosomes (donor and acceptor), polyAbinding factor, and the like.

[0192] Particularly preferred binding proteins are those that when theirbinding sites are spaced suitably close together, block binding of thecognate protein of the “adjacent” site. Such proteins can be identifiedby simple screening. This entails providing a nucleic acid (e.g. adouble stranded DNA) containing the binding sites at various spacingsand then determining at what spacing the two proteins bind exclusively.An illustration of such an assay is provided in the Examples.

[0193] Appropriate binding site spacings (e.g., overlapping sites) canbe engineered using individual information theory (see, e.g., U.S. Ser.No. 08/494,115, filed on Jun. 23, 1995, Hengen et al. (1997) supra.,Schneider (1991) J. Theoret. Biol., 148: 125, and Schneider (1997) Nucl.Acids Res., 25: 4408-4415). In one approach, sequence walkers for allthe required components are displayed for a sequence. The sequence isthen modified while the quantitative effect on each walker is observed(see, e.g., Example 1 and FIG. 8). For example, it is possible toengineer restriction enzyme site into a Fis site while maintaining thesame strength of the Fis site (see, e.g., Schneider (1997) Nucl. AcidsRes., 25: 4408-4415). Thus design of the overlapping sites for the logiccomponents is straightforward and can be accomplished computationally.

[0194] Nucleic acid binding proteins whose binding sites appear to beclosely associated (e.g. in pairs) in nature are expected to provideparticularly good candidates for naturally occurring blocking sites.Searches for the occurrence of such binding sites can be accomplishedcomputationally using deposited (e.g., GenBank) sequence information.Methods of searching for and identifying such binding sites using“Sequence Walkers” are described in Schneider (1997) Nucl. Acids Res.,25: 4408-4415, and in copending application U.S. Ser. No. 08/494,115,filed on Jun. 23, 1995.

[0195] It is expected that for most protein binding sites, theexclusionary spacing will range from about 0 base pairs (completeoverlap), to about 60 base pairs, preferably from about 0 base pairs toabout 40 base pairs, more preferably from about 0 or 1 base pair to bout20 base pairs and most preferably from about 7 to about 11 base pairs.

[0196] Where the flip-flop state or the inputs or outputs is to bealtered (reset) it is desirable to use nucleic acid binding proteinsthat bind reversibly to the nucleic acid. Such reversible bindingproteins are known to those of skill in the art and include, forexample, Fis, LacI, lambda cI, lambda cro, LexA, TrpR, ArgR, AraC, CRP,FNR, OxyR, IHF, GalR, MalT, LRP, SoxR, SoxS, sigma factors, chi, T4MotA, P1 RepA, p53, NF-kappa-B, while preferred input and/or output RNAbinding proteins include, but are not limited to ribosomes, T4 regA,spliceosomes (donor and acceptor), polyA binding factor, and the like.

[0197] 2) GTPase-Like Binding Proteins.

[0198] One particular preferred group of reversibly binding proteins arethose for which release can be accomplished with the consumption of anenergy source (e.g. hydrolysis of ATP or GTP to ADP or GDPrespectively). One particular preferred class of such proteins are theGTPase-like or ATPase-like binding proteins. GTPase proteins, e.g.,EF-Tu, bind to a nucleic acid (in this case tRNA) and then are releasedwhen provided an energy source (e.g., GTP). The proteins may alsorequire a co-factor for release (e.g., GAP) and thus release can bemodulated by limiting supply of either the energy source or theco-factor. Detailed descriptions of GTPase-like proteins can be found inScheffzek et al.(1997) Science, 277(18): 333-338, and Ahmadian et al.(1997) Nature Structural Biology, 4(9): 686-689).

[0199] The GTPase-like proteins provide a convenient means for settingand resetting the flip-flop. Such a resettable flip-flop is illustratedin FIG. 6. This consists of a nucleic acid having 4 binding sites,designated c, a₁, b, and a₂. The site can be bound by σ_(x) which is anucleic acid binding protein. The a proteins are GTPase-like proteinsthat, when triggered, release the nucleic acid. The proteins alsocontain a GAP finger that can trigger the release of the adjacent aprotein, but that cannot trigger its own release (see, Scheffzek et al.supra. and Ahmadian et al. supra.)

[0200] The σ_(c) protein then binds site c and because of its GAPfinger, causes the removal of any σ_(a). Thus, when the flip-flop iscontacted with σ_(a) the σ_(a) only remains bound at site a₂. Theflip-slop is thus stable at site a₂. To switch its state, the flip-flopis contacted with σ_(b) that binds at site b causing the removal ofσ_(a) and leaving the flip-flop stably bound at site b, the second stateof the flip-flop. The flip-flop can be reset to state a by contacting itwith ca which binds to sites a₁ and a₂. When a, is bound by σa it willcause the release of σ_(b). Although σ_(a) may only be boundtransiently, the free σ_(a) in solution will ensure that the site isoccupied long enough to cause the displacement of σ_(b). This leaves theflip-flop reset to state a, with site a₂ bound by σ_(a). The cycle canbe repeated indefinitely.

[0201] It will be appreciated that non-naturally occurring releasableproteins can be obtained by routine selection procedures. For example,the EF-Tu protein (or other binding proteins, e.g., restrictionnucleases) can be routinely mutagenized and expressed on the surface ofa filamentous phage in a “phage display library” (see, e.g., Marks etal. (1991) J. Mol. Biol., 222: 581-597, Vaughn et al. (1996) NatureVaughn et al. (1996) Nature Biotechnology, 14(3): 309-314, and thelike).

[0202] Good releasable proteins can then be routinely identified byscreening the library for phage (clones) that bind to a nucleic acidbearing appropriate binding sites and that release in the presence of anenergy source (GTP) with or without a cofactor (e.g., GAP). Subsequentrounds of mutagenesis and selection can produce binding proteins thatshow high affinity and specificity and efficient release in a manneranalogous to the enrichment and selection for antibodies having highspecificity and affinity (see, e.g., Vaughn et al. supra, Marks et al.supra.).

[0203] 3) Construction of Nucleic Acid.

[0204] The underlying nucleic acid can be produced according to any of anumber of methods well known to those of skill in the art. In oneembodiment, the nucleic acid can be an isolated naturally occurringnucleic acid (e.g., a Fis binding site containing segment from E. coli).However, in a preferred embodiment, the nucleic acid is created de novo,e.g. through chemical synthesis.

[0205] Nucleic acids (e.g., oligonucleotides) are typically chemicallysynthesized according to the solid phase phosphoramidite triester methoddescribed by Beaucage and Caruthers (1981), Tetrahedron Letts.,22(20):1859-1862, e.g., using an automated synthesizer, as described inNeedham-VanDevanter et al (1984) Nucleic Acids Res., 12:6159-6168.Purification of oligonucleotides, where necessary, is typicallyperformed by either native acrylamide gel electrophoresis or byanion-exchange HPLC as described in Pearson and Regnier (1983) J. Chrom.255:137-149. The sequence of the synthetic oligonucleotides can beverified using the chemical degradation method of Maxam and Gilbert(1980) in Grossman and Moldave (eds.) Academic Press, New York, Methodsin Enzymology 65:499-560.

[0206] It will be appreciated that chemically synthesizedoligonucleotides are single-stranded. Double stranded nucleic acids(e.g. for binding proteins such as Fis) can be produced by synthesizingthe complementary oligonucleotide and then annealing the two fragmentsin a simple hybridization reaction according to methods well known tothose of skill in the art (see, e.g., Sambrook, et al., MolecularCloning A Laboratory Manual (2nd Ed.), Vol. 1-3, Cold Spring HarborLaboratory, Cold Spring Harbor, N.Y., 1989).

[0207] Alternatively a single oligonucleotide can be synthesized havingcomplementary regions. Under appropriate hybridization conditions theoligonucleotide will self-hybridize forming a hairpin having a doublestranded region containing the desired binding site (see, e.g., Example2).

[0208] 4) Construction of Binding Protein.

[0209] a) Isolation

[0210] The nucleic acid binding protein can be isolated from naturalsources, mutagenized from isolated proteins or synthesized de novo.Means of isolating naturally occurring nucleic acid binding proteins arewell known to those of skill in the art. Such methods include but arenot limited to well known protein purification methods includingammonium sulfate precipitation, affinity columns, column chromatography,gel electrophoresis and the like (see, generally, R. Scopes, (1982)Protein Purification, Springer-Verlag, N.Y.; Deutscher (1990) Methods inEnzymology Vol. 182: Guide to Protein Purification, Academic Press, Inc.N.Y.).

[0211] Where the binding protein binds reversibly, nucleic acid affinitycolumns bearing a nucleic acid having a binding site specific for theprotein of interest can be used to affinity purify the protein.Alternatively the protein can be recombinantly expressed with a HIS-Tagand purified using Ni²⁺/NTA chromatography.

[0212] b) Chemical Synthesis

[0213] In another embodiment, the binding protein can be chemicallysynthesized using standard chemical peptide synthesis techniques. Wherethe desired subsequences are relatively short the molecule may besynthesized as a single contiguous polypeptide. Where larger moleculesare desired, subsequences can be synthesized separately (in one or moreunits) and then fused by condensation of the amino terminus of onemolecule with the carboxyl terminus of the other molecule therebyforming a peptide bond. This is typically accomplished using the samechemistry (e.g., Fmoc, Tboc) used to couple single amino acids incommercial peptide synthesizers.

[0214] Solid phase synthesis in which the C-terminal amino acid of thesequence is attached to an insoluble support followed by sequentialaddition of the remaining amino acids in the sequence is the preferredmethod for the chemical synthesis of the polypeptides of this invention.Techniques for solid phase synthesis are described by Barany andMerrifield, Solid-Phase Peptide Synthesis; pp 3-284 in The Peptides:Analysis, Synthesis, Biology. Vol. 2: Special Methods in PeptideSynthesis, Part A, Merrifield, et al. (1963) J. Am. Chem. Soc., 85:2149-2156, and Stewart et al. (1984) Solid Phase Peptide Synthesis, 2nded. Pierce Chem. Co., Rockford, Ill.

[0215] c) Recombinant Expression.

[0216] In a preferred embodiment, the binding proteins are synthesizedusing recombinant DNA methodology. Generally this involves creating aDNA sequence that encodes the binding protein, placing the DNA in anexpression cassette under the control of a particular promoter,expressing the protein in a host, isolating the expressed protein and,if required, renaturing the protein.

[0217] DNA encoding binding proteins or subsequences of this inventioncan be prepared by any suitable method as described above, including,for example, cloning and restriction of appropriate sequences or directchemical synthesis by methods such as the phosphotriester method ofNarang et al. (1979) Meth. Enzymol. 68: 90-99; the phosphodiester methodof Brown et al. (1979) Meth. Enzymol. 68: 109-151; thediethylphosphoramidite method of Beaucage et al. (1981) Tetra. Lett.,22: 1859-1862; and the solid support method of U.S. Pat. No. 4,458,066.

[0218] The amino acid and nucleic acid sequences of literally hundredsof nucleic acid binding proteins are well known to those of skill in theart. Thus, for example, the amino acid sequence of Fis (E. coli Factorfor Inversion Stimulation) is found at Swiss-Prot entry P11028. Thenucleic acid sequences encoding the desired binding protein(s) may beexpressed in a variety of host cells, including E. coli, other bacterialhosts, yeast, and various higher eukaryotic cells such as the COS, CHOand HeLa cells lines and myeloma cell lines. The recombinant proteingene will be operably linked to appropriate expression control sequencesfor each host. For E. coli this includes a promoter such as the T7, trp,or lambda promoters, a ribosome binding site and preferably atranscription termination signal. For eukaryotic cells, the controlsequences will include a promoter and preferably an enhancer derivedfrom immunoglobulin genes, SV40, cytomegalovirus, etc., and apolyadenylation sequence, and may include splice donor and acceptorsequences.

[0219] The plasmids of the invention can be transferred into the chosenhost cell by well-known methods such as calcium chloride transformationfor E. coli and calcium phosphate treatment or electroporation formammalian cells. Cells transformed by the plasmids can be selected byresistance to antibiotics conferred by genes contained on the plasmids,such as the amp, gpt, neo and hyg genes.

[0220] Once expressed, the recombinant binding proteins can be purifiedaccording to standard procedures of the art as described above.

[0221] One of skill in the art would recognize that after chemicalsynthesis, biological expression, or purification, the bindingprotein(s) may possess a conformation substantially different than theconformations of the native polypeptides. In this case, it may benecessary to denature and reduce the polypeptide and then to cause thepolypeptide to re-fold into the preferred conformation. Methods ofreducing and denaturing proteins and inducing re-folding are well knownto those of skill in the art (See, Debinski et al. (1993) J. Biol.Chem., 268: 14065-14070; Kreitman and Pastan (1993) Bioconjug. Chem., 4:581-585; and Buchner, et al., (1992) Anal. Biochem., 205: 263-270).Debinski et al., for example, describes the denaturation and reductionof inclusion body proteins in guanidine-DTE. The protein is thenrefolded in a redox buffer containing oxidized glutathione andL-arginine.

[0222] One of skill would recognize that modifications can be made tothe binding proteins without diminishing their biological activity. Somemodifications may be made to facilitate the cloning, expression, orincorporation of the targeting molecule into a fusion protein. Suchmodifications are well known to those of skill in the art and include,for example, a methionine added at the amino terminus to provide aninitiation site, or additional amino acids (e.g., poly His) placed oneither terminus to create conveniently located restriction sites,termination codons or purification sequences.

[0223] It is also recognized that a large number of binding proteinshave been cloned and can be reproduced using standard recombinant DNAmethodology. In addition some (particularly normal and modifiedrestriction enzymes) are commercially available.

[0224] B) Binding Site Selectors/Modulators/Blockers.

[0225] It was noted above, that the binding of binding proteins toparticular binding sites can be controlled by the use of variousselectors (also referred to as modulators or blockers). The selector canbe a binding protein or any other moiety that selectively blocks thebinding site with which it is associated. Thus, the selector can be amodified restriction endonuclease (e.g., an EcORI mutated to eliminatethe cleavage function, see, e.g., King et al. (1989) J. Biol. Chem.,264(20): 11807-11815 and Wright et al. (1989) J. Biol. Chem., 264(20)11816-11821) that binds to, but does not cleave the underlying nucleicacid. The selector could also be a chemical that selectively modifiesthe underlying nucleic acid (e.g., base modification, thymidinedimerization, etc.) to prevent attachment of the binding protein. Otherpossible selectors include nucleic acids (e.g., antisense molecules)peptide nucleic acids, streptavidin, avidin, and the like.

[0226] Selectors can also include photocleavable blockers. Such blockersremain attached to the substrate molecule until they are exposed tolight of a particular wavelength. Once exposed they cleave therebyunblocking the site and allowing the binding of the signal molecule.This permits the use of an optical signal to set the state of variouselements of the logic circuit. Similarly the use of fluorescent readoutmethods, described above, provides an optical output. Thus, both inputand output of the system can be effected by optical signals.

[0227] This is convenient for computational input and output. It isnoted that where the computational elements are components of “logiccassettes”, this also provides optical control over gene expression. Itis expected that such optical control systems will prove mostefficacious in vitro.

[0228] Photocleavable blockers are well known to those of skill in theart and include, but are not limited to NVOC, MeNPOC,Dimethoxybenzoinyl, or DDZ. (see, e.g., U.S. Pat. Nos. 5,679,773,5,639,603, 5,525,735, 5,709,848, 5,556,961, and 5,550,215).

[0229] C) Tethered Activator(s).

[0230] It was explained above that tethering a gene activator (e.g.,Gal4) to an “output” nucleic acid binding site provides a mechanism forcoupling the output of one logic element (e.g. flip-flop or gate) toanother. It has been demonstrated that a number of gene activators willactivate a gene even in the absence of the native response element whenthe activator is tethered to the underlying nucleic acid. This was firstdemonstrated by Ptashne who showed that a nucleic acid binding protein(an Escherichia coli repressor protein; LexA) fused to an activator (aSaccharomyces cerevisiae transcriptional activator; Gal4) activatedtranscription of a gene if and only if a protein binding site (the lexAoperator) is present near the transcription start site (see Ptashne(1985) Cell, 43(3): 729-736, and Farrell et al., (1996) Genes Dev.,10(18): 2359-2367).

[0231] Other tethered activators have been made, for example, by fusinga heterologous DNA binding domain (Gal4) to yeast ADA2 protein (see,Silverman et al. (1993) Proc. Natl. Acad. Sci. USA, 91: 11665-11668).Similarly, fusion of a heterologous DNA binding domain to the aminoterminus of CREB-binding protein allowed the chimeric protein tofunction as a protein kinase A-regulated transcriptional activator.(Chrivia et al. (1993) Nature, 365: 855-859). Similarly, the BOB.1/OBF.1B cell-restricted cofactors fused to GAL4 DNA binding domain canefficiently activate octamer-dependent promoters in fibroblasts (see,Pfisterer et al. (1995) Biol. Chem., 270(50): 29870-29880). Other geneactivators and repressors are well known to those of skill in the art.

[0232] When the binding proteins are joined by a linker, the length ofthe linker is selected so that when one end of the tethered construct isbound to its cognate target (e.g., the output of a logic gate), thebinding protein at the opposite end is brought into close proximity(e.g., juxtaposed) to transcription initiation site it is designed tointeract with.

[0233] Methods of linking proteins are well known to those of skill inthe art. Typically the binding and activator proteins can be linked by achemically conjugated linker or alternatively, can be expressed as afusion protein in which the two binding proteins are linked by apolypeptide.

[0234] Means of chemically conjugating molecules are well known to thoseof skill (see, for example, Chapter 4 in Monoclonal Antibodies:Principles and Applications, Birch and Lennox, eds. John Wiley & Sons,Inc. N.Y. (1995) which describes conjugation of antibodies to anticancerdrugs, labels including radio labels, enzymes, and the like). Proteinscontain variety of functional groups; e.g., carboxylic acid (COOH) orfree amine (—NH₂) groups, which are available for reaction with asuitable functional group on a suitable linker bind the protein thereto.

[0235] Alternatively, the binding protein(s) may be derivatized toexpose or attach additional reactive functional groups. Thederivatization may involve attachment of any of a number of linkermolecules such as those available from Pierce Chemical Company, RockfordIll.

[0236] A “linker”, as used herein, is a molecule that is used to jointhe binding proteins. The linker is capable of forming covalent bonds toboth the targeting molecule and to the effector molecule. Suitablelinkers are well known to those of skill in the art and include, but arenot limited to, straight or branched-chain carbon linkers, heterocycliccarbon linkers, or peptide linkers. The linkers may be joined to theconstituent amino acids through their side groups (e.g., through adisulfide linkage to cysteine). However, in a preferred embodiment, thelinkers will be joined to the alpha carbon amino and carboxyl groups ofthe terminal amino acids.

[0237] A bifunctional linker having one functional group reactive with agroup on a particular agent, and another group reactive with anantibody, may be used to form the desired tethered construct.Alternatively, derivatization may involve chemical treatment of thebinding protein, e.g., glycol cleavage of a sugar moiety attached to theprotein with periodate to generate free aldehyde groups. The freealdehyde groups on the protein may be reacted with free amine orhydrazine groups on an agent to bind the agent thereto. (See U.S. Pat.No. 4,671,958). Procedures for generation of free sulfhydryl groups onpolypeptides are also known (See U.S. Pat. No. 4,659,839).

[0238] Many procedures and linker molecules for attachment of variousproteins are known. See, for example, European Patent Application No.188,256; U.S. Pat. Nos. 4,671,958, 4,659,839, 4,414,148, 4,699,784;4,680,338; 4,569,789; and 4,589,071; and Borlinghaus et al. Cancer Res.47: 4071-4075 (1987). Such linkers are widely used in the production ofimmunotoxins and can be found, for example in “Monoclonal Antibody-ToxinConjugates: Aiming the Magic Bullet,” Thorpe et al., MonoclonalAntibodies in Clinical Medicine, Academic Press, pp. 168-190 (1982),Waldmann, Science, 252: 1657 (1991), U.S. Pat. Nos. 4,545,985 and4,894,443.

[0239] In a preferred embodiment, the tethered construct is expressed asa recombinant fusion protein (i.e. the two binding domains are joined bya polypeptide linkage). This basically involves providing an expressioncassette encoding both binding proteins, and if necessary a linker,transfecting a cell with the expression cassette and thereby expressingthe tethered construct. Methods of expressing heterologous nucleic acidsare described above in the discussion of recombinant expression ofbinding proteins and the recombinant expression of bindingprotein—activator fusion proteins is well known (Silverman et al. (1993)Proc. Natl. Acad. Sci. USA, 91: 11665-11668, Chrivia et al. (1993)Nature, 365: 855-859, and Pfisterer et al. (1995) Biol. Chem., 270(50):29870-29880).

[0240] It will be appreciated that the binding proteins attached to theactivators need not be fill-length binding proteins. To the contrary, ina preferred embodiment, only the nucleic acid binding domain will beattached to the termini of the tethered construct.

[0241] III) Solution Phase, Solid Phase and In vivo Systems.

[0242] A) Ex Vivo Systems.

[0243] Where the logic constructs (e.g. flip-flops and/or gates) of thisinvention are used for computation and/or affinity chromatography, theapplication will preferentially be performed ex vivo. In this context,the computational elements may be utilized in solution and/or they maybe attached to a solid support. When attached to a solid support thecomputation and/or affinity assay is effectively run in the solid phase.The term “solid support” refers to a solid material which may befunctionalized to permit the coupling of the nucleic acid or the bindingprotein to the surface. However many solid supports (e.g.,nitrocellulose) do not require such derivatization. Any material towhich the nucleic acid or binding protein can be attached and which isstable to reagents with which it will be contacted is suitable. Solidsupport materials include, but are not limited to,polacryloylmorpholide, metals, plane glass, silica, controlled poreglass (CPG), polystyrene, polystyrene/latex, and carboxyl modifiedteflon.

[0244] In one embodiment, the various logic elements can be arranged inarrays. Methods of making single or double stranded nucleic acid arraysare well known to those of skill in the art (see, e.g., U.S. Pat. No.5,143,854 and PCT patent publication No. WO 90/15070).

[0245] In solution or solid phase, the input signal proteins, outputsignal proteins, blockers, and tethered construct(s) can be addedsimultaneously or sequentially as needed. Similarly, readout isperformed by any of a number of routine means as described above.

[0246] Gene expression may also be performed ex vivo in extractednatural, or synthetic expression systems. Such systems, typicallyinclude a buffer, and all of the elements necessary to transcribe andtranslate a gene (e.g., ATP, Mg, ribozymes, nucleotide triphosphates,etc.).

[0247] Alternatively, the gates of this invention could be added tobioreactors to simultaneously assay and modulate the reactorenvironment. Thus, for example, the OR gate described above when addedto a bioreactor will bind one or more analytes if they are present inthe reactor system. Such binding will set the output HIGH. The HIGHoutput binding site can then bind a third analyte in the bioreactorrendering it less available or unavailable to the organisms growingtherein.

[0248] B) In Vivo Systems

[0249] In another embodiment, the logic controls of this invention canbe used for regulation of gene expression. As explained above, a cellcan be transfected with one or more “logic cassettes”; expressioncassettes comprising nucleic acids that encode one or more genes whoseexpression is under the control of a logic element (e.g., one or moregates and/or flip-flops) of this invention.

[0250] The logic cassette can be transfected into a cell using standardvectors as described above. The logic cassette can additionally encodeone or more binding proteins and/or one or more tethered constructs.Alternatively, the binding proteins can be endogenously expressedproteins or can be provided by other expression or logic cassettes.Similarly, the tethered constructs can also be expressed by one or moreseparate expression cassettes or logic cassettes.

[0251] IV. Other Substrate-Based Logic Elements.

[0252] While the examples provided herein demonstrate logic elements(flip-flops and gates) that use nucleic acid binding sites, othersubstrates are suitable as long as they can be specifically bound. Suchsubstrates include, but are not limited to proteins, glycoproteins,sugars, and the like. Proteins provide a particularly preferredalternative substrate. It is well known that a single protein candisplay a wide variety of different epitopes each of which isspecifically bound by a particular antibody (e.g. a monoclonal antibodyor antibody fragment Fv, Fab′, etc., or single chain Fv, etc.). Moreoverit is also known that epitopes can be juxtaposed such that they cannotbe simultaneously bound by their respective antibodies. This principleforms the basis of epitope mapping.

[0253] Thus, a protein that displays two epitopes for which antibodybinding is mutually exclusive forms the basis for a protein substrateflip-flop. Similarly, sugars or glycoproteins can form a substrate formutually exclusive lectin binding according to the same principle.

[0254] V. Kits for Molecular Computation and/or Complex ExpressionControl.

[0255] This invention also provides kits for molecular computationand/or for the regulation of gene expression. The kits comprise one ormore containers containing the logic elements (e.g., flip-flops, gates,logic cassettes) of this invention. The container(s) may simply containthe underlying nucleic acid or the combined nucleic acid and/or signalpolypeptide and/or one or more tethered constructs. Where the kit isdesigned for ex vivo application, the various elements may be providedattached to one or more surfaces of a solid support (e.g. a 96 wellmicrotiter plate).

[0256] The kit may also optionally include reagents, buffers,fluorescent labels, etc., for the practice of one of the methodsdescribed herein.

[0257] The kits may optionally include instructional materialscontaining directions (i.e., protocols) providing for the use of thelogic elements (flip-flops, gates, etc.) of this invention in molecularcomputation systems, gene control, and the like. While the instructionalmaterials typically comprise written or printed materials they are notlimited to such. Any medium capable of storing such instructions andcommunicating them to an end user is contemplated by this invention.Such media include, but are not limited to electronic storage media(e.g., magnetic discs, tapes, cartridges, chips), optical media (e.g.,CD ROM), and the like. Such media may include addresses to internetsites that provide such instructional materials.

EXAMPLES

[0258] The following examples are offered to illustrate, but not tolimit the claimed invention.

Example 1 Overlapping Fis Sites 7 or 11 Base Pairs Apart Are Not BoundSimultaneously

[0259] In this Example, information theory was used to predict Fis(Factor for Inversion Stimulation) binding sites in Escherichia coliDNAs. These predictions have been confirmed by previously existing DNaseI footprints or by gel mobility shift experiments. In many diversegenetic systems including six site-specific inversion regions, λ att,dif nrd, ndh and the fis promoter, Fis sites are also predicted to be 7or 11 base pairs apart. These overlapping Fis sites are frequentlycoincident with binding sites of other proteins, suggesting that Fis canblock access to DNA. The structure of the Fis sequence logo, molecularmodeling and gel mobility shift experiments all indicate that Fis sitesseparated by 7 or 11 bases are bound antagonistically. Two overlappingFis sites separated by 11 base pairs also occur in the E. coli origin ofchromosomal replication (oriC). The data presented herein suggest thatboth sites bind Fis and that they compete for binding to create twodistinguishable molecular states in vitro. Since only one of the twooverlapping Fis sites can be bound by Fis at a time, the structure is amolecular flip-flop. These two Fis sites are precisely positionedbetween two DnaA sites in oriC, suggesting that the flip-flop directsalternative firing of replication complexes in opposite directions.

[0260] Fis is a well characterized site-specific DNA binding protein.When Escherichia coli encounters a rich nutritional medium, the numberof Fis molecules increases from nearly zero to 25,000 to 50,000 dimersper cell (Ball et al. (1992) J. Bact. 174: 8043-8056). Estimates of thenumber of Fis sites in the E. coli genome based on the averageinformation in Fis sites give a similar number, indicating that most ofthese molecules are controlling genetic systems throughout the genome(Hengen et al. (1997) Nucl. Acids Res., 25(24): 4994-5002). Fis is knownto bend DNA and it is involved in many site-specific recombinationsystems. In addition, it autoregulates its own promoter and activatesother promoters (Johnson & Simon (1987) Trends in Genetics, 3: 262-267;Finkel & Johnson (1992) Molec. Microb., 6: 3257-3265; Finkel & Johnson(1992) Molec. Microb., 6: 1023).

[0261] Information analysis of Fis binding sites and their surroundingsequences has revealed previously unidentified sites adjacent to knownones (Hengen et al. (1997) supra.).

[0262] It was observed that pairs of Fis sites are often separated by 7or 11 bases in many genetic systems. These Fis sites often overlap thebinding sites of other proteins in significant places such as the X issite of λ att (Schneider (1997) Nucl. Acids. Res., 25: 4408-4415). Tounderstand the significance of these pairs we sought to determinewhether Fis binds cooperatively or antagonistically at the adjacentsites. In this study we show that in an artificial DNA construct thesites cannot be bound simultaneously and therefore act as a molecularflip-flop.

[0263] DNA replication starts at 83 minutes on the E. coli chromosome ata locus called oriC. Bidirectional replication starting at oriC iscompletes in the terminus region half way around the chromosome.Replication is dependent on the DnaA protein which binds at 5 sites inoriC. Using sequence walkers (Schneider (1997) Nucl. Acids. Res., 25:4408-4415), we observed that there are likely to be two Fis sites wedgedprecisely between two of the DnaA sites.

[0264] Results and Discussion

[0265] Self-Competition between Natural Fis Sites.

[0266] When we searched DNA sequences for Fis sites using an informationtheory based weight matrix, we observed Fis sites spaced 11 base pairsapart in hin gin, and min (see FIG. 5 in (Hengen et al., 1997, supra.).Since B-form DNA twists every 10.6 base pairs, the sites should be onthe same side of the DNA. While it is conceivable that two adjacentproteins can bind simultaneously by a subtle interleaving of their DNAcontacts, it seems more likely that they will compete for binding in themajor groove since after an 11 base shift the sequence logo shows thatthe predominant C at −7 corresponds to the G at +4 and the C at −4corresponds to the C at +7 (downward pointing arrows in FIG. 7).Competition between these internally (FIG. 7) redundant patterns(Schneider & Mastronarde (1996) Discrete Applied Mathematics, 71:259-268) would allow Fis to change its site of DNA bending and perhapsthis is important for inversion.

[0267] In contrast, in the P1 cin, P7 cin and E. coli e14 pin sites, thespacing between pairs is only 7 bases, which would place the Fis dimers122° apart on B-DNA (360°-360° per turn×7 bases/10.6 bases perturn=122°). After a 7 base shift, the sequence logo shows that the G at−7 would match the A/T of the minor groove on the opposite face of theDNA at coordinate 0, while the CIT at −4 would match the T/C at +3 andthe A/G at −3 would match the C/A at +4 (up arrows in FIG. 7). Thisallows for the possibility that the two proteins bind at the same time,which might also be important to the function of these regions.

[0268] To investigate the consequences of two Fis molecules binding tonearby sites, we constructed 3 dimensional models (Schneider (1997)Nucl. Acids Res., 25: 4408-4415). We found that two Fis proteins boundto sites separated by 11 base pairs might strongly interpenetrate. Incontrast, a 7 base pair separation might only have a minimal van derWaals force conflict between the two central D helices. This might beaccommodated for by flexibility of the DNA-protein complex, given thatthere is some uncertainty as to how Fis binds DNA. We thought that11-base separated Fis molecules would compete for binding but that a7-base separation might allow simultaneous binding.

[0269] These ideas are supported by the preliminary observation thatsynthetic DNA containing either the hin proximal or medial Fis sites arebound by Fis in electrophoretic mobility gel shift assays (Hengen et al.(1997) supra.). When these overlapping sites were together on the samefragment with a spacing of 11 bases only one band shift was observed,suggesting that only one of the sites can be bound at a time. To testwhether this is the case requires using high concentrations of Fis andstrong Fis sites to ensure that both sites would be bound if that werepossible.

[0270] Test of the 7-11 Alternation Model

[0271] To determine whether overlapping Fis sites can be simultaneouslybound by Fis, we synthesized strong Fis sites that overlap by either 7or 11 base pairs (FIG. 8a, 8 b) or that were separated by 23 base pairs(FIG. 7c) and tested their properties by gel shift. Neither the 11 northe 7 overlapping sites showed a doubly-shifted band, even at anextremely high Fis/DNA ratio and with exceptionally strong (>12 bits)Fis binding sites (FIG. 9), suggesting that only one Fis molecule couldbind to each DNA fragment. The DNA fragment with two Fis sites separatedby 23 base pairs did double shift, demonstrating that well separated Fissites can cause two distinct band shifts (FIG. 9). However, Fis cancreate a ladder on non-specific sequences (Betermier et al. (1994)Biochimie, 76: 958-967), and this might account for the double-shifts.Under our conditions with short DNAs, a non-specific (all positions <1bit) 66 bp DNA fragment barely shifted at high Fis concentration (datanot shown) so the secondary shifts were not from non-specific binding.These results demonstrate that Fis sites separated by 7 or 11 basescannot be bound simultaneously. We could not exclude the possibilitythat the single-shifted bands at high concentration of Fis contain twomolecules of Fis per DNA as a complex that runs exactly as a DNA boundby one Fis molecule, but this seems highly unlikely given thesensitivity of gel shifts to molecular weight changes and the resultsfor the 23 base separated DNA.

[0272] A second Fis molecule might be blocked by direct sterichindrance, but it is also possible for the first protein to distort theDNA enough to eliminate or occlude the second site. If a distortionmechanism is used, it remains possible that two weak Fis sites could bebound simultaneously. Further, the 7 base overlapped sites may besimultaneously bound at lower temperature, since low thermal agitationmight allow binding despite some mechanical strain. Finally,superhelical DNA and other conditions might allow simultaneous binding.

[0273] Fis Switching: Genetic Implications of the 7-11 Flip-Flop Model

[0274] The tyrT promoter has three Fis sites separated by 20 and 31 basepairs, as in our 23 base pair separated control experiment (FIG. 8 andFIG. 9). The separation in tyrT is sufficient for three Fis dimers tosimultaneously position themselves on the same face of the DNA tocooperatively bind a σ⁷⁰ subunit and activate transcription of stableRNA promoters (Muskhelishvili et al. (1995) EMBO J., 14: 1446-1452). Inaddition to this activation mechanism, which is based on separatedsites, Fis may also have evolved another control mechanism that usesoverlapping sites.

[0275] When we scanned our Fis individual information model acrossvarious sequences, we discovered 7 and 11 spacings at inversion regions,the fis, nrd, and ndh promoters, and at dif, E. coli oriC and λ alt(Hengen et al., 1997 supra; Schneider (1997) Nucleic Acids Res., 25:4408-4415). In the latter three systems, Fis sites overlap binding sitesof other proteins in significant places, so we do not think that Fissites appear at this spacing merely because of the internal redundancyof the site. For example, scanning with the Fis weight matrix revealstwo Fis sites previously identified in oriC at coordinates 202 and 213(Roth et al. (1994) Biochimie, 76: 917-923). Footprinting data from twodifferent groups show protection covering one, the other and both sites(FIG. 10). The two Fis sites fit exactly between the R2 and R3 DnaAsites and have similar individual information contents, suggesting thattheir binding energies are similar, so in the absence of other effectsFis could occupy them for nearly equal fractions of the time as aflip-flop. Binding by DnaA and by Fis are mutually exclusive (Gille etal. (1991) Nucl. Acids. Res., 19: 4167-4172), implying that the positionof a Fis-induced DNA bend could be controlled by DnaA and the binding ofDnaA could be controlled by Fis. During nutritional upshifts when thereis a high Fis concentration (Ball et al. (1992) supra.), occupancy ofone Fis site should ensure only one DnaA site is available at a time.Since absence of Fis leads to asynchronous replication (Boye et al.(1982) In DNA Replication and the Cell Cycle, Fanning Knippers andWinnacker eds., vol. 43: 15-26, Springer-Verlag, Berlin), this flip-flopmight control alternative firing of replication complexes in oppositedirections.

[0276] Closely spaced sites are often bound cooperatively, as in theclassical example of T4 gene 32 autogenous regulation (Miller et al.(1994) In Molecular Biology of Bacteriophage T4, Karam et al., ed. pp.193-205, American Soc. Microbiol, Washington, D.C.). In contrast, Fisrepresents the unusual situation where a protein competes with itself bybinding at overlapping positions. Self-occlusion has been observed inartificial constructs, where one ribosome is apparently blocked by thepresence of another ribosome bound nearby (Barrick et al. (1994) Nucl.Acids. Res., 22: 1287-1295. Likewise, in ColE1 and ColE7, a pair of LexAsites may be competing with each other for LexA binding (Ebina et al.(1983) J. Biol. Chem., 258: 13258-13261); Lu & Chak (1996) Mol. Gen.Genet., 251: 407-411).

[0277] The same pair of LexA sites in ColE7 may also compete with a 9.1bit Fis site immediately upstream, and all three of these sit adjacentto two overlapping IHF sites immediately downstream. This interplay offactors may be typical of more complex flip-flop mechanisms. Forexample, as many as five Fis sites are likely to be in λ att. Two ofthese are spaced 11 base pairs apart, with one of them overlapping an Xis site (Schneider (1997) Nucl. Acids. Res., 25: 4408-4415).

[0278] The positioning of Fis binding sites relative to one another andto the binding sites of other proteins therefore appears to be key forthe ability of Fis to perform many diverse functions. Fis has evolved atranscriptional activation mode in which sites are on the same face ofthe DNA and are sufficiently apart to be bound simultaneously. Fis mayalso have specifically evolved to allow for two competitive bindingmodes. When the sites are on the same face of DNA (11 bp apart), asingle Fis molecule could disengage and rebind to move the bend locationbetween two possible places without changing the overall direction ofthe DNA. When sites are on nearly opposite faces (7 bp apart), Fis wouldcause the bend direction to change by 122°. How these cogs fit into thelarger picture of pleiotropic Fis functions remains to be determined.

[0279] This Example demonstrates that adjacent Fis sites can have twodistinct binding modes in which Fis competes with itself for binding andtherefore acts as a molecular flip-flop.

[0280] Materials and Methods

[0281] Sequence Analysis Programs

[0282] Delila system programs were used for handling sequences andinformation calculations (Schneider et al. (1982) Nucl. Acids Res., 10:3013-3024; Schneider et al (1984) Nucl. Acids Res., 12: 129-140;Schneider et al. (1986) J. Mol. Biol., 188: 415-431; Schneider &Stephens (1990) Nucl. Acids. Res., 18: 6097-6100; Stephens & Schneider(1992) J. Mol. Biol., 228: 1124-1136; Schneider (1997) J. Theoret.Biol., 189(4): 427-441; Schneider (1997) Nucl. Acids Res., 25:4408-4415). Figures were generated automatically from raw GenBank datausing Delila and UNIX script programs.

[0283] Design of Fis Binding Experiments

[0284] Synthetic DNAs containing strong Fis sites separated by 11 and 7base pairs were designed by selecting from the most frequent bases ateach position in the Fis sequence logo (Hengen et al. (1997) supra.).These were then merged with the same sequence shifted by 11 or 7 basepairs by comparing the R_(iw)(b,l) values for various choices. (Note:the consensus sequence of the early model we used was TTTG(G/C)TCAAAATTTGA(G/C)C AAA (SEQ ID NO: 4) which differs from that of the logo.)Five extra bases were added to the ends based on the natural sequencesaround the hin proximal and medial sites for the overlap 11 oligo, andthe sequences around cin external and proximal sites were used for theoverlap 7 oligo (Hengen et al. (1997) supra.). The DNAs were made selfcomplementary (FIG. 8a, 8 b). Sites separated by 23 bases were createdstarting with the 11 base separated DNA and duplicating the centraloverlap region. A BamHI site was also inserted and the DNA was flankedby EcORI sites (FIG. 10c). Oligonucleotides were synthesized with biotinon the 5′ end and gel purified (Oligos Etc., Wilsonville, Oreg., USA).To ensure thorough annealing, they were heated to 90° C. for 10 minutes,and slowly cooled to room temperature. The annealed products wereelectrophoresed through an 8% (w/v) polyacrylamide gel, and the bandscorresponding to the linear duplex DNA of the correct size were slicedfrom the gel. DNA was recovered by electroelution and extracted withisoamyl alcohol to remove ethidium bromide. A non-specific control DNAwas composed of the two 66 bp hinFI fragments from bacteriophage φ174(Life Technologies, Inc). Gel mobility shift experiments were performedas described (Hengen et al. (1997) supra.).

Example 2 Readout of a Fis/DNA Flip-Flop

[0285] A single very long nucleic acid is synthesized having a sequencein the center that causes a hairpin loop to form rapidly (see, FIG. 11).The entire DNA can then be dissolved in a buffer heated and cooledthereby forming double stranded DNA. This guarantees that thecomplementary strands are equimolar and there isn't any single-strandedDNA present in the mixture.

[0286] The nucleic acid is designed so that it form the oriC site onhairpin formation and a biotin is attached (via a 19 atom linker) to theT's at either position 77 or 78.

[0287] Fis is put in with the hairpin loop DNA. The Fis is expected tobind to the two positions (18 and 29 in FIG. 11, note that the Fis sitesat 87 and 98 are the same ones on the other strand of the DNA).Streptavidin is then added.

[0288] When a Fis molecule is bound at position 18, the streptavidin canalso bind and one should see a high band shift consisting of DNA, Fisand streptavidin. When a Fis molecule is bound a position 29, it willblock streptavidin and only DNA and Fis will be present so the band willbe lower on the gel. Visiblity of both bands will indicate that insolution both binding sites form.

[0289] Appropriate controls include knockouts of each binding siteindividually and of both sites. Experiments include just DNA, DNA+Fis,DNA+Fis+Streptavidin, and DNA+Streptavidin+Fis to see that the order ofaddition affects the results. The only time that there should be twobands is the DNA+Fis+Streptavidin order of addition when both Fis sitesexist.

[0290] It is understood that the examples and embodiments describedherein are for illustrative purposes only and that various modificationsor changes in light thereof will be suggested to persons skilled in theart and are to be included within the spirit and purview of thisapplication and scope of the appended claims. All publications, patents,and patent applications cited herein are hereby incorporated byreference for all purposes.

1 19 1 20 DNA Artificial Sequence Description of ArtificialSequenceconsensus sequence of early model of Factor for InversionStimulation (Fis) binding site 1 ttgstcaaaa tttgascaaa 20 2 42 DNAArtificial Sequence Description of Artificial Sequence paired Factor forInversion Stimulation (Fis) binding sites with 11 bp spacing; overlap 112 tattctttgc tcaaaatttg atcaaatttt gagcaaagaa ta 42 3 38 DNA ArtificialSequence Description of Artificial Sequence paired Factor for InversionStimulation (Fis) binding sites with 7 bp spacing; overlap 7 3aggcttttgc tcaaagttta aactttgagc aaaagcct 38 4 15 DNA ArtificialSequence Description of Artificial Sequence sequence logo for Factor forInversion Stimulation (Fis) binding site 4 gctcaaaatt tgatc 15 5 58 DNAArtificial Sequence Description of Artificial SequenceFactor forInversion Stimulation (Fis) binding sites separated by 23 bp; separated23 5 ggaattcttt gctcaaaatt tgatcaggat cctgatcaaa ttttgagcaa agaattcc 586 21 DNA Artificial Sequence Description of Artificial Sequence18.1 bitFis site 6 tttgctcaaa atttgatcaa a 21 7 21 DNA Artificial SequenceDescription of Artificial Sequence18.1 bit Fis site 7 tttgatcaaattttgagcaa a 21 8 21 DNA Artificial Sequence Description of ArtificialSequence12.7 bit Fis site 8 tttgctcaaa gtttaaactt t 21 9 21 DNAArtificial Sequence Description of Artificial Sequence12.7 bit Fis site9 aaagtttaaa ctttgagcaa a 21 10 21 DNA Artificial Sequence Descriptionof Artificial Sequence15.0 bit Fis site 10 tttgctcaaa atttgatcag g 21 1121 DNA Artificial Sequence Description of Artificial Sequence15.0 bitFis site 11 cctgatcaaa ttttgagcaa a 21 12 46 DNA Escherichia coli originof replication (oriC) 12 gttatacaca actcaaaaac tgaacaacag ttgttctttggataac 46 13 15 DNA Artificial Sequence Description of ArtificialSequenceFis site separated by 11 bases; 9.1 bit Fis site 13 gaacaacagttgttc 15 14 15 DNA Artificial Sequence Description of ArtificialSequenceFis site separated by 11 bases; 8.4 bit Fis site 14 actcaaaaactgaac 15 15 113 DNA Artificial Sequence Description of ArtificialSequencesynthesized single very long nucleic acid with hairpin loop DNA15 aacgggatcc actcaaaaac tgaacaacag ttgttcgaat tcctcgagcg atcggcgaag 60ccgatcgctc gaggaattcg aacaactgtt gttcagtttt tgagtggatc ccg 113 16 21 DNAArtificial Sequence Description of Artificial Sequence8.4 bit Fis site16 tccactcaaa aactgaacaa c 21 17 21 DNA Artificial Sequence Descriptionof Artificial Sequence10.0 bit Fis site 17 actgaacaac agttgttcga a 21 1821 DNA Artificial Sequence Description of Artificial Sequence10.0 bitFis site 18 ttcgaacaac tgttgttcag t 21 19 21 DNA Artificial SequenceDescription of Artificial Sequence8.4 bit Fis site 19 gttgttcagtttttgagtgg a 21

What is claimed is:
 1. A system comprising an isolated nucleic acidhaving a length of at least 5 base pairs and having a nucleotidesequence that encodes a first protein binding site and a second proteinbinding site where said first and second protein binding sites arespaced in proximity to each other such that: when said first proteinbinding site is specifically bound by a protein, said second bindingsite cannot be bound by a protein that otherwise specifically recognizesand binds said second binding site; and when said second binding site isspecifically bound by a protein, said first binding site cannot be boundby a protein that otherwise specifically recognizes and binds said firstbinding site, and a nucleic acid binding protein that specifically bindssaid first protein binding site or said second protein binding site. 2.The composition of claim 1, wherein said nucleic acid is adouble-stranded nucleic acid.
 3. The composition of claim 1, whereinsaid nucleic acid is a deoxyribonucleic acid (DNA).
 4. The compositionof claim 1, wherein said first binding site and said second binding sitehave the same nucleotide sequence.
 5. The composition of claim 4,wherein said first binding site and said second binding site have thenucleotide sequence of SEQ ID NO:
 1. 6. The composition of claim 1,wherein said first binding site or said second binding site isspecifically recognized and bound by a protein selected from the groupconsisting of Fis, and Tus.
 7. The composition of claim 1, wherein saidfirst binding site or said second binding site is bound by EF-tu.
 8. Thecomposition of claim 1, wherein said first binding site is within 20nucleotides of said second binding site.
 9. The composition of claim 1,wherein said first binding site is within 1 nucleotides of said secondbinding site
 10. The composition of claim 8, wherein said first bindingsite has a strength of at least 2.4 bits as determined by individualinformation theory.
 11. The composition of claim 1, wherein thedifference in strength between said first protein binding site and saidsecond protein binding site is at least 0 bits as determined byindividual information theory.
 12. The composition of claim 1, furthercomprising a third protein binding site wherein said third site is inproximity to said first protein binding site or to said second proteinbinding site such that specific binding of said third binding site witha protein precludes specific protein binding of said first or saidsecond protein binding sites.
 13. The composition of claim 1, wherein:said first protein binding site is a Fis binding site; said secondprotein binding site is a Fis binding site; and said binding sites areseparated from each other by less than 12 nucleotide base pairs.
 14. Thecomposition of claim 13, wherein said nucleic acid is a deoxyribonucleicacid comprising the sequence of SEQ ID NO: 2 or SEQ ID NO: 3.